XML/ 0000755 0001751 0000144 00000000000 14636540175 010743 5 ustar hornik users XML/README 0000644 0001751 0000144 00000000774 13607633674 011637 0 ustar hornik users The Packages/ directory has some package source tar.gz files.
See index.html for a description of the package and the installation
procedures.
This R package is not in the R package format in the github repository.
It was initially developed in 1999 and was intended for use in both
S-Plus and R and so requires a different structure for each.
make ADMIN=1
copies the files to an appropriate structure for R.
It currently requires some supporting tools from the Omegahat
admin facilities.
XML/MD5 0000644 0001751 0000144 00000053623 14636540175 011264 0 ustar hornik users a4b42cfd693ca4bfe46096018d887cbf *ChangeLog
4ce0142d918f335c54bd04b8271a50bf *DESCRIPTION
82de24270a90587d5492f8d12e85abcd *LICENSE
38d88c89e954d242754f758e149248e4 *NAMESPACE
b29605c39dd87687b2928668b9b52c6d *R/AAA.R
05aace6af9a5a1b841570d5af955146f *R/DTD.R
ff35133bf0deee2608952a104a91a913 *R/DTDClasses.R
2765bfebaefb9dd88753f0f46983980d *R/DTDRef.R
141db4feac6fc837cb1cb681f2374444 *R/SAXMethods.R
38a8b2e07bbc13e081d32f7c252de8c6 *R/XMLClasses.R
2f7e79941ed0b53c43367ba90cc3b7bd *R/XMLRErrorInfo.R
ac3aac6bc1944fff50ee7c5522154a21 *R/applyDOM.R
f6b79f4f7155fcc58b09178f906cd584 *R/assignChild.R
231e85de6cb9ea078ef0cb2c2213108d *R/bitList.R
a15627f62c15dddff63f50d038855fe4 *R/catalog.R
284dc88fce56e922416b0f8a4f4c93e6 *R/compare.R
d45c7ea9a3026b309a258ffa38a2bce5 *R/createNode.R
1dea4cf235af2cf05e1fb01f2b5a71d4 *R/dynSupports.R
8401b221124b4efb00ee989369522356 *R/encoding.R
e6e506d5fee0a21b59c2fddcf652699e *R/error.R
1ac5e56a5e6c8f5876f8078d3340aebf *R/fixNS.R
e2d9c2def91560514649a3cca62e5f06 *R/flatTree.R
f8b6ca9e074f7d121814fb3cfd4f0d3c *R/getDependencies.R
762fd9017f65818096f8a5967bde9336 *R/getRelativeURL.R
a6bf923a5a5e66ca887ecd9dbccdab46 *R/hashTree.R
c1001a4a49a44e593ae73d9dd2da4873 *R/htmlLinks.R
9865b82f2c355021d696f2a219997dd3 *R/htmlLists.R
31aba90b32ddddcf6c97822bd23cf283 *R/htmlParse.R
319ad3af61d22bfb1b18b17889fea6c9 *R/keyValueDB.R
87fc1f8084549a2a96aeac61d19ef396 *R/libxmlFeatures.R
694bac4ce6060c28841e70a4715f0a33 *R/namespaceHandlers.R
57957d2e3fe9252d2df2d88a70b9f1dc *R/namespaces.R
1eb950a6be727681717a28964c7d43e6 *R/nodeAccessors.R
0a08d749c99681148ef8c635de0cd6cd *R/parseDTD.R
719bbd175850d2c345fe7245c45c7605 *R/parser.R
1da99c4595f2cad8d7c92c3916973750 *R/readHTMLTable.R
ed3eddf359449369643ee34b7953fb4a *R/reflection.R
9dc1ee46d1277d6edfb4e846c89663c0 *R/saveXML.R
b342c8113cd53470a81a31fbaf5e6b8e *R/schema.R
da2153fc4017a2360b0f0ed2c5df3a23 *R/serialize.R
2fb09d82bbc6ab332646981cada8eee3 *R/simplifyPath.R
0c6410b1686f5da871fc704f2705ff6f *R/solrDocs.R
ec6b4d13943f86381db6aa1583465fdb *R/summary.R
1eced5424f333bdd41bc79db551556d2 *R/supports.R.in
fead2c9d63dc1a7acf3244431d84e1e0 *R/tangle.R
033f27eaa77cba260e69fabe55c0c2e6 *R/tangle1.R
9bc0920c6ab4c3011613d6e6a4d8030a *R/toString.R
3c83404c574ba491c4a8238cba41defc *R/tree.R
09e4d0bea29cff3b118f144b6d394c4a *R/version.R
4f4954b191d7593ac720220b9ec0117d *R/xincludes.R
05cf14141f68e19441882cf078c8fecb *R/xmlErrorEnums.R
65d281b1b17dcac2c51b57a044337d09 *R/xmlEventHandler.R
4f09b46f6b9f80ecd9eb3e35ab961594 *R/xmlEventParse.R
e7aa7373fed505f7521b09824262b821 *R/xmlHandler.R
708a0b6988ed9e30f27724f56bf4610b *R/xmlIncludes.R
03170fa934f8adceed4c050d77bf0545 *R/xmlInternalSource.R
9d5a05b12bcffd8afe0b4bb154ecc7c2 *R/xmlMemoryMgmt.R
85f9b9b72ed10849323c8b8b8ee23828 *R/xmlNodes.R
468ff02fdbebfe9f72ee1d7b45cecfd0 *R/xmlOutputBuffer.R
1111ef217cf4adc2859e1bb378a46ed8 *R/xmlOutputDOM.R
58e1c8f1d96c1b01ac399ce0bee9b42e *R/xmlRoot.R
d79e29581564fcb906eb9900437a8ee2 *R/xmlString.R
8d44500d5b5a04fa60301d575839a542 *R/xmlToDataFrame.R
8b33b9583824359c60b2f7f54f753d03 *R/xmlTree.R
c1909bbb22618650552d7c9cca20d119 *R/xmlTreeParse.R
4ac67423f4eb47a3b0d2817f8e9d082a *R/zzz.R
081453acd1b3e6642a8ba518d2d14251 *README
b0777f8f8a90181ea55c9efd274ba7d3 *cleanup
fb248a085dbe1cc71c22ef263485d56b *configure
d32e082bc9f133816d2388ff5352cfd9 *configure.ac
42f539f105aa0e7da29b7a209beba941 *configure.win
0b3b8ce6fd20c02d923efa867f9932c9 *inst/COPYRIGHTS
ff54e5b8500551dcb3239e7054a77001 *inst/FAQ.html
44d42d43addc439811b65d03ac7b4619 *inst/exampleData/9003-en.html
cf0768ed8a4e59a342242b6c1d33c035 *inst/exampleData/9003.html
b80e1ae6db7489cf7f5e6b48af480546 *inst/exampleData/GNUmakefile
a1e81bd1bc29f244d35afa4b6efed92e *inst/exampleData/README
2d6e8210d1eafecdea788df2bd2e0384 *inst/exampleData/Rref.xml
d37680440cb95ae187a12d15f51cb10e *inst/exampleData/Rsource.xml
dbfedc92e65ebdbbb2ed7d7a468fb894 *inst/exampleData/SOAPNamespaces.xml
79ce6fe72b75c6968eed74a920c04bb9 *inst/exampleData/StatModel.dtd
efedd98237d8a6c2f279becd9b50a0e9 *inst/exampleData/TestInvalid.xml
4ee6988da664e5930ee670c6930fe337 *inst/exampleData/allNodeTypes.xml
7f50ccdaf78c88a6b2b0b5bfcddb1fea *inst/exampleData/author.xml
4a8057b4fb9de7d5b82228bcd2a62e68 *inst/exampleData/author.xsd
610921f9e4798e229b7143e403147219 *inst/exampleData/author1.xml
1120f2d2733484f09680f85c33378602 *inst/exampleData/author2.xml
2fde199a80305fc2ae20824a69b87e66 *inst/exampleData/author2.xsd
bb728a400627b16058a4a14e64fd180b *inst/exampleData/basic.xml
2bf5de346f3b02da2527ae0f7706bc49 *inst/exampleData/book.xml
ac36cd5afa45e76057c1fec502e857e0 *inst/exampleData/boxplot.svg
0b8cd77a8e89cccef6288c25bf1c2655 *inst/exampleData/branch.xml
8db93ffa949d3a52c580b11fb5712d26 *inst/exampleData/catalog.xml
ebf7cbd275268fee6d99255d40b92815 *inst/exampleData/cdata.xml
c4aea8311842ec63cba5bbe11246b9ee *inst/exampleData/charts.svg
2181f5ee7e349d271a3895bcc9ff4961 *inst/exampleData/cleanNamespace.xml
6364ba6fe7ca258455db30aa3aaab459 *inst/exampleData/content.html
44dc03ad2980890cfc2a7f14d4134897 *inst/exampleData/dataframe.xml
8c0e739d76e85bc31f5fff42e3b31fb3 *inst/exampleData/dot.xml
c70532ba974608092ab3f05421ec39bd *inst/exampleData/dtd.zip
4af9e6abc00fe42e0b719b0c92fcd855 *inst/exampleData/entity.xml
c710cbd83e3fa6f387a85a53bac1f121 *inst/exampleData/entity1.xml
6ad56e424028ad33c81a8c061676c9a9 *inst/exampleData/entity2.xml
3287218d12d1191c46e01ccb8bab42ab *inst/exampleData/entity3.html
6ead5e2aa1d5b4d042377296215667b8 *inst/exampleData/entity4.xml
de5ee7ed01d3ddd6b11c933f01fcee17 *inst/exampleData/entityValue
06cfddecffe04977b837494622ac1f44 *inst/exampleData/eurofxref-hist.xml.gz
cd2944d52234fe6890aea6b1f78e40b1 *inst/exampleData/event.xml
e8eeb195309a9204d1ade99130943e81 *inst/exampleData/foo.dtd
83e2cde65b0837ddc64b9c570e9ecf05 *inst/exampleData/functionTemplate.xml
8ceea19ebb7722fece59b070c0fefaab *inst/exampleData/generalInfo.xml
68036459f5b8f80421c0de4a9ece9e2a *inst/exampleData/gnumeric.xml
b58a3cce66f3aba2abc4a621fe872aea *inst/exampleData/graph.gxl
3b8924fc68699274888160bb50c89096 *inst/exampleData/iTunes.plist
beb24da8ab9fa7ec5e47559f9320f1cc *inst/exampleData/include.xml
aadeec466fab1bc2067e9b8889fc785d *inst/exampleData/job.xml
f249ead2193d3976bff318096ed656b8 *inst/exampleData/kiva_lender.xml
26ed8c871e37dbeb7169258b4d597a28 *inst/exampleData/largeText.xml
2dc1e4b0db1ad4449c3e43413c8e828a *inst/exampleData/lists.html
b1aa2e1d35e3071c2ec9caa85194da7b *inst/exampleData/literate.dtd
f19eea9e1a763e11875950294e12f21f *inst/exampleData/literate.xml
d8f5b8a714757af5df3b3c4a0502ccb3 *inst/exampleData/literate.xsl
08361bfe9a264b0c5dc22a04420072cb *inst/exampleData/literateFunction.xml
48f8b27de68bbe818671542ac4eea117 *inst/exampleData/longitudinalData.xml
2925b45e6f9babc05365f96f748eca7b *inst/exampleData/malformed.xml
de5f042eb58448551ee1722ae736dc61 *inst/exampleData/mathml.dtd
2cca5ea86193e8e2ba52c020716cbcb5 *inst/exampleData/mathml.xml
ea5e8caa00546abf0638ad1b0e187476 *inst/exampleData/mathmlFuncCall.xml
9600ca143954e485e1316b823a991e42 *inst/exampleData/mathmlInt.xml
75d5b4c2fcb667d551e0385eaa6bba74 *inst/exampleData/mathmlMatrix.xml
d9981b76cdf98ed2c6969b00f3fcac3a *inst/exampleData/mathmlQuadratic.xml
08b828cd2b6d0695b1442729af4794f5 *inst/exampleData/mathmlRoot.xml
0e3b0f32d9ee397056425db4379418db *inst/exampleData/mathmlScript.xml
619b193c39c7b0af1bf2453d22d2c6aa *inst/exampleData/mathmlSet.xml
36dbc35da2f1dcda612a0c83eb2eac4c *inst/exampleData/mathmlSimple.xml
432ab4c8ad737186dc47dd6eef00289d *inst/exampleData/mathmlSphere.xml
db81cd572c5679982ccee3e65beca0db *inst/exampleData/mathmlSums.xml
24002a1ef0ab95a7dd73270e95ff404a *inst/exampleData/matrixMult.xml
3f8c8d90f7632339e3dc6321a471a027 *inst/exampleData/mtcars.xml
3c564854baf7380ebc86b55ff1951b91 *inst/exampleData/namespaceHandlers.xml
ad9ab25a06187df3b0a6c81d8f7de200 *inst/exampleData/namespaces.xml
9ad8af7e80c6e1a7ec7353661a17a909 *inst/exampleData/namespaces1.xml
da759173f765e3cbafb729d21c24e050 *inst/exampleData/namespaces2.xml
a24f30320d88a7806a86183c2f8b5244 *inst/exampleData/nodes.xml
0e91b5beb9ff7cc56e303fac65c93842 *inst/exampleData/nodes1.xml
f1426b65547d2c366031079e0cccbd1c *inst/exampleData/nodes2.xml
afe9ec03e1abb8ff5c5b09ae45fa9335 *inst/exampleData/nodes3.xml
af7b231bf764cb1bb31e71d2ed5c9d3b *inst/exampleData/nsAttrs.xml
e669db449eb9a4a6cb2cfb1d2842f559 *inst/exampleData/plist.xml
5e531be512a447d9b886f36c5bd8c5de *inst/exampleData/raw.xml
9e52d4d5f4bba538e0306c6a22830bf6 *inst/exampleData/redundantNS.xml
21b05839f6a1bb2de0d8536a213528d2 *inst/exampleData/reparent.xml
88d6fd716774a6c44f212980b8bf0c7d *inst/exampleData/rhelp.xsl
9bd7e5d34b9ddd237bbc7745ea0199e2 *inst/exampleData/rxinclude.xml
183bf612ca6e5c890229931f453fdd8f *inst/exampleData/same.xml
0742028b822457bf861587c39b1cc1ad *inst/exampleData/setInterval.xml
b249f35607ad13d2ef1bab2393d0004b *inst/exampleData/simple.plist
fc3d1a8275735fd4427e939165c7e652 *inst/exampleData/simple.xml
e3e5697d89ad0a2bd0b0c41ce2c1266e *inst/exampleData/size.xml
590f58281834abdd7c526e9b215750a4 *inst/exampleData/size1.xml
d55386fde14974c6baa010887e612430 *inst/exampleData/size2.xml
d67895692910308f94011b9f5fb0152d *inst/exampleData/size3.xml
906f785af884bfd1aec11e5acda4c9a3 *inst/exampleData/solr.xml
752af23b592f34dcea376d6a974db2bd *inst/exampleData/something.xml
0403c6eb52393ae26dcae0469e2633ec *inst/exampleData/svg.xml
c8641303870984426f793cbb6e429e7d *inst/exampleData/symslines.svg
1bdc99465a3308524b52c880708b5178 *inst/exampleData/tagnames.xml
7e2c79f6142315de68284c4c11f4f6b3 *inst/exampleData/test.xml
af855018d145cf15c003a16d90e0916c *inst/exampleData/test1.xml
597371c880c3f567bda055a9e55fa17c *inst/exampleData/tides.xml
5f134aa5c6fcdcf8d2f26e5b60ac9120 *inst/exampleData/utf.xml
8da18b9854857f77beab31ceeb065c1b *inst/exampleData/vars.xml
7e84b65ee6af58a74a2762687a15ff12 *inst/exampleData/writeRS.S
0de92f65f4b930fe9771ef6796b4583c *inst/exampleData/writeRS.xml
b5e2c9cb6f41ddae5481f168761cdecb *inst/exampleData/xinclude/a.xml
a2dbd9063592a474076a4fa13f33b966 *inst/exampleData/xinclude/b.xml
7e9d4b62b63326ad2b043393c4fc0b4b *inst/exampleData/xinclude/c.xml
ee12821458852845d84ba37f3f8012d7 *inst/exampleData/xinclude/d.xml
5ecd9593d72f87c8648e2ab7be5814cc *inst/exampleData/xinclude/e.xml
30b3e0c2056e41e7ea8bc2e14b5b361e *inst/exampleData/xinclude/simple.xml
3710d0c649035714b6e89873326951b7 *inst/exampleData/xinclude/top.xml
628b2098d6e59d8964df122215f286ad *inst/exampleData/xpathTest.xml
2bd5f311c3cf42bfdcc17a94f41f4b21 *inst/exampleData/xysize.svg
f3cde0f326bae48239155a8f003086f4 *inst/exampleData/xyz.svg.gz
4cc0a05705b8d796cd18d86e23976e5b *inst/examples/CIS.R
0a0425fc2df2e3438c9c226ecda22eb4 *inst/examples/DiGIR.R
b741c69ecefb8a58aa829974daffe87b *inst/examples/GNUmakefile
a9008dcdfa2107e5c79f7f42b83f0d54 *inst/examples/HTMLText.R
398a829f0d66ce743eab9505cb6490e5 *inst/examples/README
9e8c2b2daa4c88bfa4d2a07de014339c *inst/examples/Rhelp.xml
1ab905aac19cb951a9c3ff5a184d50fc *inst/examples/RhelpArchive.xml
a17eeb45d631e58e1e785439d1abd62f *inst/examples/RhelpInfo.xml
74e02084be0d639663d0955fee5e664f *inst/examples/SAXEntity.R
ca5f6777ece68d2225789bf573ab7aa6 *inst/examples/SSource.dtd
931407aa812b6b22dd8734e3b861a541 *inst/examples/author.R
e0110777dd3603557e1156f003a70175 *inst/examples/bondYields.R
eb96f280983ffc7eda3b75d6fdf28fe7 *inst/examples/bondsTables.R
93e43213dcc1c972d7ab3bb5acb9ac07 *inst/examples/catalog.R
7b96bde0f308701b272590496b1a4b4c *inst/examples/connections.R
1d881dcf97c18a05a545d8caf9cbd589 *inst/examples/connections1.R
193722245b150f003aa6c0ed67e49acc *inst/examples/createTree.R
6b11fbe49509cd09c20b433d8b1b9fca *inst/examples/createTree1.R
fea7a47b2f17464a36e897bb4dc148b7 *inst/examples/dataFrameEvent.R
dc1d21cbc5b7051e4120c2a49849dc0c *inst/examples/docbook.R
8961e4e9d970588050e80de8ccfd54a6 *inst/examples/ecb.R
aa2a998a56d5b948ef52c9f7055b2d19 *inst/examples/enhancedDataFrame.R
6a8a69608880f8a1bc7382e4f29e9e8e *inst/examples/event.R
6225a3cfd7f010c79cb46f016c4d39ad *inst/examples/event.S
bbc46baf6ab9a671b8a32f9c93f44c14 *inst/examples/eventHandlers.R
53a90c1a32cd469dbf664a5ef69ecc1c *inst/examples/faq.R
da0e4b523962d2b8fee68d4b404d2462 *inst/examples/filterDataFrameEvent.R
7d747d516d2778419b4cd500d21eb65a *inst/examples/foo.html
49ecb1ec6023519bb310a2bdd1ff6bd1 *inst/examples/formals.xsl
553ea13eaaf4ec9e1708c7d5027af3df *inst/examples/functionIndex.Sxml
dddaf71eb6703b73c24426f388c24218 *inst/examples/genericHandlers.R
27f8cdf1d2df4efdd8706065331ac66b *inst/examples/generic_file.xml
98237d086b3bb907ccfce49c0eba1b87 *inst/examples/getElements.S
963cfec2a57ecb7ab040a48f4490f3e8 *inst/examples/gettingStarted.html
1cc8cb28e54bddab52bc403b0b4fd069 *inst/examples/gettingStarted.xml
ebe5014777483064b65331f7d5704586 *inst/examples/gnumericHandler.R
0e286bc72b43f41dff67db9f4cb09fb8 *inst/examples/hashTree.R
e729098f8a9a2045e9e138d56eb9b94d *inst/examples/iTunes.plist
e1d4464f2dc9db55104be454f9f0ee93 *inst/examples/index.html
52a2f734918edcd65e9ff3e6d5d7aea2 *inst/examples/internalNodes.S
cb2a1471077fa5d25a8d0b29930359bc *inst/examples/internalXInclude.xml
22dece19680308448b99e26a550450e2 *inst/examples/itunes.R
5a44f0bbe6a34309872a5c09b17aa269 *inst/examples/itunes.xsl
b973f589468083bf1c9031afd90ed3bd *inst/examples/itunesSax.R
5b9ea2b9ef39d14f3e5eba4e382c5d7c *inst/examples/itunesSax1.R
c365f368746b5a38ad7c81ac535e67b0 *inst/examples/itunesSax2.R
06b6ff5309b53e0d1f482a749fb9d27e *inst/examples/mathml.R
f506a024b31c84e2deddd594240bc142 *inst/examples/mathmlPlot.R
4be116184764542572c1e4d5a0004bf9 *inst/examples/metlin.R
80a2167ebaf00ee4946e11e5b3fa0c7c *inst/examples/mexico.R
41d008fff79c428b8b2139d80af8710d *inst/examples/mexico.xml
a8df416e57318da4eea96ab5715ab44b *inst/examples/mi1.R
f2cc473a1e09c120f941e796a4e653b2 *inst/examples/modified_itunes_sax.R
bc3ab0b0426862cc3d16b2308596e084 *inst/examples/namespaces.S
91f1b67dc3de04dea2fcc2f6920541f5 *inst/examples/namespaces1.S
ce0cefc8f25a0994f488073c42e2d900 *inst/examples/newNodes.R
dddf168fd7c5080be74a1576ef5466dd *inst/examples/oop.S
ad089f3655d2f3acee30f7067b4d0b09 *inst/examples/pi.xml
785049ea7cfdae7b5e5e8e6b009982e9 *inst/examples/pmml.R
fd3553a039641a18d7054220a67a7750 *inst/examples/prompt.xml
5441838e027e655b42651f4d382fdd15 *inst/examples/promptXML.R
b14b2b7e24e681eee419ced37bc43948 *inst/examples/promptXML.Sxml
631b33237934928479582f3a3d7ecf82 *inst/examples/rcode.xml
d41d8cd98f00b204e9800998ecf8427e *inst/examples/rcode.xsl
25a7c74cf466eb3daf9dbf493b718e89 *inst/examples/redirection.R
0d41bf0aadbce1279d255f80587d185a *inst/examples/reorder.xml
bb78fe90a650625ae8dbd40ecf3636e2 *inst/examples/sbml.xml
acadb3e6695e45ee3eb7bd786d494383 *inst/examples/sbmlSAX.S
9d49e19ba166fb9d9773c431938cbf06 *inst/examples/schema.xsd
04b016367c23b30873d1e5eca554bb6d *inst/examples/schemaEg.xml
755449de058d8f0300592fa140bbc99e *inst/examples/schemas.xml
603ead1f9155d91d3cdb0a004db097ec *inst/examples/svg.R
c48636aa7b630785d5cc395fb6a0fa5a *inst/examples/tags.Sxml
1b2489004d118db8920a7a3f208c33b7 *inst/examples/trademe_cars.R
94ab383e5c6f30fdcf3f3b13ceef6eb1 *inst/examples/valueFilterDataFrameEvent.R
7df5d486adb60fcde1d84fbd94ae1801 *inst/examples/wordML.R
a215bd1733d94718ed88a9632677eda0 *inst/examples/writeExamples.S
9053aae1a374bd10eee4e701e9c87bdb *inst/examples/xml2tex.Sxml
b976c0a84da79df21f1d90e186402baa *inst/examples/xmlSource.R
411b82529f6cd31fa9b2634f2e2ca152 *inst/examples/xmlTags.xml
5f42960279557bdc33de85c7799f6a0f *inst/examples/xmlTree.R
fdd705555e0c8267f6ef8bb0289390e9 *inst/examples/xpath.R
777186450f8e22d4f650499a8a150c3f *inst/examples/xpath.xml
068dee4e4e15c0f69e8df8a5a8eaa897 *inst/scripts/RSXML.bsh.in
03ae53b54d6d2168fcc7454c14682086 *inst/scripts/RSXML.csh.in
ff1d23fdc64afb10560da4072e5e384e *man/AssignXMLNode.Rd
92ceaf274e4ebe321861b256b2262262 *man/Doctype-class.Rd
21df1aefac7e79053ac327bf4cc187ff *man/Doctype.Rd
8077dfeb6984bc654cf08ac857df5185 *man/SAXMethods.Rd
c351eb7d588e799653622715a5202121 *man/SAXState-class.Rd
d40daeb56cf6466e0ea5c3f789cf87fc *man/XMLAttributes-class.Rd
115aed144ea989230a5ba211355a385f *man/XMLCodeFile-class.Rd
25936f5c421f873b144226aa8d099344 *man/XMLInternalDocument.Rd
b902296706b8fce8cccc5c912f5226a8 *man/XMLNode-class.Rd
80db5e6e8aeaee89d591958ff175655d *man/addChildren.Rd
7e2a071add5e13c3769a75665e83225a *man/addNode.Rd
559d78cddaaf4b1f8b9587b4765931eb *man/addSibling.Rd
c1ac7589d644af035ef4e5e38a7b463d *man/append.XMLNode.Rd
09e56315e0cce3cdd48d0d47ef8510fd *man/asXMLNode.Rd
1be889cae84dc996aa91b836d2edc4a5 *man/asXMLTreeNode.Rd
ad21fc83db2ccac174b48ad02d33aef9 *man/catalogResolve.Rd
6b604e709c29505b312532b77aa99752 *man/catalogs.Rd
635ee70f7d0119f27af30f48e2b05522 *man/coerce.Rd
22684c65804739d57060e84fa8bf4b29 *man/compareXMLDocs.Rd
79fd28a580531816d0028d74ffc5ef57 *man/docName.Rd
209ee649a2c52876b825f6ef1669ad29 *man/dtdElement.Rd
7675558f9687a98c871d7f215f42dc4e *man/dtdElementValidEntry.Rd
78b5b724f2474a4801b60029457a5352 *man/dtdIsAttribute.Rd
3711f5a6d158946f64c609575eb7ecf8 *man/dtdValidElement.Rd
66b32dc1889221b239b8c3c1dfabd839 *man/ensureNamespace.Rd
d3e3902ea4fe25d3857aa794dec075b0 *man/findXInclude.Rd
0ee0f2813c0fc4fc732f8731445f506b *man/free.Rd
6205f11393939b912302330665a4624f *man/genericSAXHandlers.Rd
3d95b1004bd3e03b0d136bf4b3be8eae *man/getChildrenStrings.Rd
9443e267094cee408efe309f7223ab91 *man/getEncoding.Rd
6d94506d430ac22ba6ece4cffc9ddece *man/getHTMLLinks.Rd
ab922d47da651c3656c4f9fc64d3c333 *man/getLineNumber.Rd
ca4905696a784ae7f7b8a8f0c2bfa653 *man/getNodeSet.Rd
e53d3f1962a3276a717453b022b55483 *man/getRelativeURL.Rd
edcefdb7086f06186adad54d9980682a *man/getXIncludes.Rd
2f3fdbde9fb3f351326eca1bfcbc81f0 *man/getXMLErrors.Rd
cfa4e11592ed00f9cb8247d0e59d3f95 *man/isXMLString.Rd
720c4fe165adaf7268bbea26fcf670fd *man/length.XMLNode.Rd
9398e99857eecde62710f84fe3f51137 *man/libxmlVersion.Rd
1cc7d2805bfe8d6d97064f5b5810bfac *man/makeClassTemplate.Rd
bcc6efd399abe7a78f398f445f8410d8 *man/names.XMLNode.Rd
49856050e87fc54c666cf9fcfc328dc7 *man/newXMLDoc.Rd
43e3c93ff036a105e40dd1bf228d7236 *man/newXMLNamespace.Rd
3e0d0d90ec56e3fbbbfbf12f58469aac *man/parseDTD.Rd
02c7cd373c3ea7d6c476df65b87980e4 *man/parseURI.Rd
1c85a344a9e4bfd39cb3df303a2e20cf *man/parseXMLAndAdd.Rd
4a577e2869ec67fd1fdd1d81d017511a *man/print.Rd
d36e4628f65dff3c8d5fe347b126676d *man/processXInclude.Rd
f5607c676182e472a1f2a2a25f9db8e6 *man/readHTMLList.Rd
16e599b591aa177b6527f11665909cd6 *man/readHTMLTable.Rd
8f81d16226120619817fa1a172679a08 *man/readKeyValueDB.Rd
8d15b91eaa346f72664bbb62ab5455fc *man/readSolrDoc.Rd
72281e5a24125d35b55afc9f70555a4a *man/removeXMLNamespaces.Rd
56794c3fb2cd5d4918c32dc8acf10f30 *man/replaceNodeWithChildren.Rd
e036dbeee478708b736ed15360634a80 *man/saveXML.Rd
24b46bf8f25897bacbcb7915b3d54687 *man/schema-class.Rd
d56bfed177c3648de5284ef766500b3a *man/setXMLNamespace.Rd
9f498c1f0936c26ebf3fc70fa4018610 *man/supportsExpat.Rd
c85b172d3a0894a6255c34323d777c04 *man/toHTML.Rd
931a149ae3b82f306c2e4abb70f3347a *man/toString.Rd
af018657db302e2b7ebf0f63d8eb20e3 *man/xmlApply.Rd
033ba024a4023ac984f4dddc6079a8ad *man/xmlAttributeType.Rd
ec6d297236892a9f9497be4de5d99eb0 *man/xmlAttrs.Rd
9a66741994d3e76e3004cbd2977ba735 *man/xmlChildren.Rd
0cdf4d25b1158c4162316028f244586e *man/xmlCleanNamespaces.Rd
36d83a7dcf52b5934a40247a46f4e2a4 *man/xmlClone.Rd
e2f6330711aff2dd1df7c2bef9a0adb6 *man/xmlContainsEntity.Rd
bd88d5b5436f57bd16f896fbb3837495 *man/xmlDOMApply.Rd
368fa01a925a08db5b309623c46b99ed *man/xmlElementSummary.Rd
6254505b1223b4accc51a87197a3154d *man/xmlElementsByTagName.Rd
07dc0ecb650472c2d1880a037ca49a81 *man/xmlEventHandler.Rd
c853f6adda406317023bcdb0913816a5 *man/xmlEventParse.Rd
fd932d88f2c3a3569dbc3ad399d39822 *man/xmlGetAttr.Rd
e7aea11004bc636831c31529a881bbb6 *man/xmlHandler.Rd
27297649f3305d5bace0cdb20a75c705 *man/xmlHashTree.Rd
dab8360df5edc368a530274327f4d245 *man/xmlName.Rd
103e4167eb19e9e61ba10023b90d5e3c *man/xmlNamespace.Rd
9800e219024b4b039230c9537c77c5d4 *man/xmlNamespaceDefinitions.Rd
d24c723d68b6a88a89fb83e31a1596d1 *man/xmlNode.Rd
cd2d11323d9ceea52093309cf8322a9d *man/xmlOutput.Rd
6a443760a3b345e78b99358a0fec28f5 *man/xmlParent.Rd
30122ab8c7c7263901d2b674731955bb *man/xmlParseDoc.Rd
be5f3c2c9f4878e8b4df61c6df89a38c *man/xmlParserContextFunction.Rd
e37360fcaa6ff0394040b2e26e187968 *man/xmlRoot.Rd
927722ec945cb55d1abb0e256f1b992e *man/xmlSchemaValidate.Rd
37b7957855c872a54bb1c60abef9e0b3 *man/xmlSearchNs.Rd
71b54af27e0f35cb4915bb058544e6e4 *man/xmlSerializeHook.Rd
793059a6b10875bfd03765d2162bfa19 *man/xmlSize.Rd
8f67d3e131531756743f1afaa16d76e4 *man/xmlSource.Rd
52b079e639f02b3e4780beed343de405 *man/xmlStopParser.Rd
05356cf28e8d23254acbf87162c9a8ee *man/xmlStructuredStop.Rd
ffa5a30b01090b1d05a5a19c9a3bea68 *man/xmlSubset.Rd
55711b6c3f421eef82d2041c87dec5f4 *man/xmlToDataFrame.Rd
29646a080808de5840f7bcb8994400c9 *man/xmlToList.Rd
15fdef72e6e90aeacf92e9afd126b3b3 *man/xmlToS4.Rd
3d7f4bf5224bfd4eeb21632a4469e711 *man/xmlTree.Rd
55020834b11452c69fcf47c46c5eecff *man/xmlTreeParse.Rd
00f96e20f0a390f00ceefeb78f26e6e6 *man/xmlValue.Rd
70da44b0aa162474cb86decbc2fdb14f *src/DocParse.c
5244014df10ef5a255f890ff7df5424f *src/DocParse.h
b5cb0dd5f43481ce20675e7501996484 *src/EventParse.c
a855ea27bb2ad2247b4bec96992d9730 *src/EventParse.h
380aaf0a76f032635d696cd1f47dda4a *src/ExpatParse.c
21327b0c7a0406a35e0aa87168d02fd9 *src/ExpatParse.h
294efeedc91eecaef4b4917de756d723 *src/HTMLParse.c
f8e657440de4c71940b40c6952e9d9d3 *src/Makevars.in
25e8b4be2f14b1b437eaa4411a30605a *src/Makevars.ucrt
217afdc4385bc2657eb8ee19978d00e5 *src/Makevars.win
4df946341180f6e453b0a9f7e6e19ba5 *src/NodeGC.c
bba8153b861732e55247d7caf8d44817 *src/NodeGC.h
956d2b8925bf7a209732e71104657f58 *src/RSCommon.h
171c784de7fd4530c552c88ca8d39695 *src/RSDTD.c
d51866b3b7de881b366d5f57819156f6 *src/RSDTD.h
c8ef975634bb5874143107dfe9e1546a *src/RS_XML.h
9b5b0ab44c9f48f388cad45e79510132 *src/RUtils.c
816c0e976c2a0a3a0cfb1ff9cff4ac43 *src/Rcatalog.c
dfb2d8cdac46685f4b989e0678dc4747 *src/Utils.c
addb035c852a81af74f1af3a759d6e11 *src/Utils.h
ec5c630530c87bc4a54e37f47a6d1336 *src/XMLEventParse.c
06d148d209a76bd776779709f29f2193 *src/XMLHashTree.c
dfd8201878dd162bb317d550592acf06 *src/XMLTree.c
06dff24eae84b53211ddb5d1ad52dfcb *src/fixNS.c
34157f4ab7a2c930e4d3ea68d14e2a7f *src/libxmlFeatures.c
b0a6144b6614a1489c4639635f448325 *src/schema.c
f9f4a35d94a429e6924e0d3260ba34bb *src/xmlsecurity.c
5ff6882475929a186aadd9b71947eca1 *src/xpath.c
XML/configure.win 0000755 0001751 0000144 00000000246 14361737640 013450 0 ustar hornik users echo "supportsExpat <- function() FALSE" > R/supports.R
echo "supportsLibxml <- function() TRUE" >> R/supports.R
echo "ADD_XML_OUTPUT_BUFFER = FALSE" >> R/supports.R
XML/R/ 0000755 0001751 0000144 00000000000 14405636156 011143 5 ustar hornik users XML/R/flatTree.R 0000644 0001751 0000144 00000014360 13610046417 013030 0 ustar hornik users ## it looks like <<- assignments here should actually be to env.
# Represent the tree as a flat collection of nodes
# but allocate the list ahead of time and grow it
# by doubling the space. This makes things a lot faster
# for large trees.
utils::globalVariables(c('e', 'idx', 'nodeNames', 'nodeSet', 'parentCount'))
## nothing here is exported.
if(FALSE){
xmlFlatListTree =
function(nodes = list(),
parents = character(), children = list(),
env = new.env(),
n = 200)
{
# To make things reasonably fast, we store the nodes in a pre-allocated list
env = structure(env, class = c("XMLFlatListTree", "XMLFlatTree"))
assign("nodeSet", vector("list", n), env)
assign("idx", 1, env)
assign("parentCount", 0, env)
assign("nodeNames", character(n), env)
assign("parents", character(n), env)
#XXX Deal with this if parents is specified.
# Assign the parents and children values and fill in any orphans, etc.
# after calling addNode for the different nodes.
if(!exists(".nodes", env))
env$.nodes <- env #?
# function to generate a new node identifier. Can be given the
# proposed name and will then make one up if that conflicts with another
# identifier.
f = function(suggestion = "") {
if(suggestion == "" || suggestion %in% nodeNames)
as.character(idx + 1)
else
suggestion
}
environment(f) = env
assign( ".nodeIdGenerator", f, env)
g = addParentNode
environment(g) = env
assign(".addParentNode", g, env)
assign(".this", env, env)
assign("n", n, env)
addNode = function(node, parentId) {
node = asXMLTreeNode(node, .this)
id = node$id
# Put it in the nodeSet by position.
nodeSet[[ idx ]] <<- node
nodeNames[idx] <<- id
idx <<- idx + 1
if(inherits(parentId, "XMLTreeNode"))
parentId = parentId$id
if(length(parentId)) {
parentCount <<- parentCount + 1
.parents[ parentCount ] <<- parentId
names(.parents)[parentCount] <<- id
.children [[ parentId ]] <<- c(.children[[ parentId ]], id )
}
if(idx == n) {
n <<- 2*n
length(nodeSet) <<- n
}
return(node)
}
environment(addNode) = env
env$.addNode <- addNode
# Populate the tree with any initial nodes.
# XXX putting these in .nodes and not nodeSet!
ids = names(nodes)
nodes = lapply(seq(along = nodes),
function(i) {
x = nodes[[ i ]]
if(!("id" %in% names(unclass(x))))
x$id = f( ifelse(ids[ i ] == "", xmlName(x), ids[i]) )
if(!inherits(x, "XMLTreeNode")) {
## no 'e' is visible here
x$env = e
class(x) = c("XMLTreeNode", class(x))
}
x
})
names(nodes) = sapply(nodes, function(x) x$id)
env$.nodes <- nodes
env$.parents = parents
env$.children = children
.tidy =
# to be run when adding to the tree is complete.
# This shrinks the vectors to their actual size
# rather than their preallocated sizes.
function() {
idx <- idx - 1
length(nodeSet) <- idx
length(nodeNames) <- idx
names(nodeSet) <- nodeNames
.nodes <<- nodeSet
idx
}
.tidy
environment(.tidy) <- env
env$.tidy = .tidy
env
}
xmlRoot.xmlFlatListTree =
function(x, skip = TRUE, ...)
{
#XXX
stop("not implemented")
}
# Represent the tree as a flat collection of nodes
# combined with
# See tests/tree.R
# Use an environment within the node so that we can lookup the children and parent information
# directly from within
#
# provide tools to set parent and children relationship.
#
# Validate entries for parents and children to ensure nodes exist.
#
# as(, "XMLTreeNode") function to make certain environment, id and class are present.
#
# Suppose we are given an empty xmlTree() object when parsing an XML document.
# Then when we are converting the results back to R, we need to add nodes as we traverse the tree.
# Need to make no
# see convertNode() called in createXMLNode()
# Given out an id within this tree for each node
#
xmlFlatTree =
#
# This version just concatenates each node to an existing list and so suffers
# horrifically from garbage collection.
# We leave it here in case it is useful either directly to someone for use on
# small documents, or for performance comparisons with other approaches.
#
function(nodes = list(), parents = character(), children = list(), env = new.env())
{
# Assign the parents and children values and fill in any orphans, etc.
# after calling addNode for the different nodes.
if(!exists(".nodes", env))
env$.nodes <- env
# function to generate a new node identifier. Can be given the
# proposed name and will then make one up if that conflicts with another
# identifier.
f = function(suggestion = "") {
if(suggestion == "" || suggestion %in% names(.nodes))
as.character(length(.nodes) + 1)
else
suggestion
}
environment(f) = env
assign( ".nodeIdGenerator", f, env)
g = addParentNode
environment(g) = env
assign(".addParentNode", g, env)
assign(".this", env, env)
addNode = function(node, parentId) {
node = asXMLTreeNode(node, .this)
id = node$id
if(length(parentId)) {
.parents[ id ] <<- parentId
.children [[ parentId ]] <<- c(.children[[ parentId ]], id )
}
.nodes[[ id ]] <<- node
id
}
environment(addNode) = env
env$.addNode <- addNode
ids = names(nodes)
nodes = lapply(seq(along = nodes),
function(i) {
x = nodes[[ i ]]
if(!("id" %in% names(unclass(x))))
x$id = f( ifelse(ids[ i ] == "", xmlName(x), ids[i]) )
if(!inherits(x, "XMLTreeNode")) {
## FIXME: there is no visible 'e' here
x$env = e
class(x) = c("XMLTreeNode", class(x))
}
x
})
names(nodes) = sapply(nodes, function(x) x$id)
env$.nodes <- nodes
env$.parents = parents
env$.children = children
structure(env, class = c("XMLSimpleFlatTree", "XMLFlatTree"))
}
}
XML/R/getDependencies.R 0000644 0001751 0000144 00000002346 14405636156 014361 0 ustar hornik users getXIncludes =
function(filename, recursive = TRUE, skip = character(),
omitPattern = "\\.(js|html?|txt|R|c)$",
namespace = c(xi = "https://www.w3.org/2003/XInclude"),
duplicated = TRUE)
{
doc = xmlParse(filename, xinclude = FALSE)
if(missing(namespace)) {
ns = xmlNamespaceDefinitions(doc, simplify = TRUE)
if("https://www.w3.org/2001/XInclude" %in% ns)
namespace = c(xi = "https://www.w3.org/2001/XInclude")
}
nodes = getNodeSet(doc, "//xi:include", namespaces = namespace)
files = sapply(nodes, xmlGetAttr, "href")
nonRecursive = as.logical(sapply(nodes, xmlGetAttr, "text", FALSE))
if(length(omitPattern))
nonRecursive = grepl(omitPattern, files) | nonRecursive
if(recursive) {
processed = c(filename, skip)
for(f in unique(files[!nonRecursive])) {
# path name relative to the base document of the XInclude
f = getRelativeURL(f, filename) # dirname(filename))
if(file.exists(f))
files = c(files, getXIncludes(f, TRUE, skip = processed))
else
warning(f, " doesn't exist")
processed = c(processed, f)
}
}
files = unlist(files)
if(!duplicated)
unique(files)
else
files
}
XML/R/xmlMemoryMgmt.R 0000644 0001751 0000144 00000000733 13607633705 014107 0 ustar hornik users setGeneric("clearMemoryManagement",
function(node, ...)
{
standardGeneric("clearMemoryManagement")
})
setMethod("clearMemoryManagement", "XMLInternalElementNode",
function(node, ...)
{
.Call("R_clearNodeMemoryManagement", node, PACKAGE = "XML")
})
manageMemory_p =
function(finalizer)
{
if(is.character(finalizer) || is(finalizer, "externalptr") ||
inherits(finalizer, c("NativeSymbol", "NativeSymbolInfo")))
return(TRUE)
as.logical(finalizer)
}
XML/R/version.R 0000644 0001751 0000144 00000000756 13610046417 012753 0 ustar hornik users libxmlVersion <-
function(runTime = FALSE)
{
v <- if(runTime) .Call("RS_XML_libxmlVersionRuntime", PACKAGE = "XML")
else .Call( "RS_XML_libxmlVersion", PACKAGE = "XML")
v <- as.character(v)
els <- substring(v, 1:nchar(v), 1:nchar(v))
list(major=els[1], minor=paste(els[2:3],sep="", collapse=""),
patch=paste(els[4:5], sep="", collapse=""))
}
setEntitySubstitution =
function(val)
.Call("RS_XML_SubstituteEntitiesDefault", as.logical(val), PACKAGE = "XML")
XML/R/dynSupports.R 0000644 0001751 0000144 00000000203 13607633667 013642 0 ustar hornik users supportsExpat <-
function()
{
is.loaded("RS_XML_initParser")
}
supportsLibxml <-
function()
{
is.loaded("RS_XML_piHandler")
}
XML/R/tangle.R 0000644 0001751 0000144 00000004060 13607633705 012540 0 ustar hornik users #
# tangle code from an XML file to a collection of files
#
getXPathExpr =
function(language, nodeNames)
{
paste(paste("//", unlist(outer(language, nodeNames, FUN = "paste", sep = ":")), sep = ""), collapse = "|")
}
getTargetFiles =
function(doc, language = names(xmlNamespaceDefinitions(doc)),
nodeNames = c("code", "function", "plot", "class", "method"),
xpath = getXPathExpr(language, nodeNames))
{
if(is.character(doc))
doc = xmlParse(doc)
nodes = getNodeSet(doc, xpath)
ans = structure(sapply(nodes, xmlGetAttr, "file"),
names = sapply(nodes, function(x) names(xmlNamespace(x)) ))
ans = tapply(ans, names(ans), function(x) unique(unlist(x)))
ans[ sapply(ans, length) != 0 ]
}
xmlTangle =
function(doc, files = getTargetFiles(doc, xpath = xpath), dir = ".",
language = names(xmlNamespaceDefinitions(doc)),
nodeNames = c("code", "function", "plot", "class", "method"),
xpath = getXPathExpr(language, nodeNames))
{
if(is.character(doc))
doc = xmlParse(doc)
if(length(files) == 0 && "r" %in% language) {
return(tangleR(doc, out = NA))
}
files =
structure(lapply(names(files),
function(ns) {
xp = paste("//", ns, ":", nodeNames, sep = "")
structure(sapply(files[[ns]],
function(file) {
expr = paste(xp, "[@file=", sQuote(file), "]", collapse = "|")
paste(xpathSApply(doc, expr, xmlValue), collapse = "\n")
}), names = files[[ns]])
}), names = names(files), class = "FileContentsList")
if(!is.na(dir))
save.FileContentsList (files, dir)
else
files
}
save.FileContentsList =
function(x, dir = ".")
{
x = structure(unlist(x, recursive = FALSE), names = unlist(lapply(x, names)))
files = paste(dir, names(x), sep = .Platform$file.sep)
sapply(seq(along = files),
function(i) cat(x[[i]], file = files[i]))
files
}
XML/R/xmlTree.R 0000644 0001751 0000144 00000022717 13607633674 012724 0 ustar hornik users xmlTree <-
#
# Create an XML document using internal nodes and help to manage
# the state for the user rather than requiring them to manage
# the individual nodes. For the most part, the two approaches
# are relatively similar in complexity.
#
#
function(tag = NULL, attrs = NULL, dtd = NULL, namespaces = list(),
doc = newXMLDoc(dtd, namespaces))
# Allows a DOCTYPE, etc. at the beginning by specifying dtd as
# a vector of 1, 2, 3 elements passed to newXMLDTDNode() or
# as an XMLDTDNode directly.
#
{
currentNodes <- list(doc) # the stack of nodes
isXML2 <- libxmlVersion()$major != "1"
# if we are given a DTD, add it to the document.
if(!is.null(dtd)) {
if(isXML2) {
node = NULL
if(inherits(dtd, "XMLDTDNode"))
node = dtd
else if(is.character(dtd) && dtd[1] != "")
node = newXMLDTDNode(dtd, doc = doc)
if(!is.null(node)) {
addChildren(doc, node)
currentNodes[[2]] <- node #???XXX
}
} else
warning("DTDs not supported in R for libxml 1.*. Use libxml2 instead.")
}
definedNamespaces = list()
defaultNamespace = NULL
addNamespaceDefinitions = is.null(tag)
setActiveNamespace = function(ns) {
defaultNamespace <<- ns
}
asXMLNode <- function(x) {
if(inherits(x, "XMLInternalNode"))
return(x)
v = if(is.list(x))
lapply(x, asXMLNode)
else
newXMLTextNode(as.character(x), doc = doc, escapeEntities = is(x, "AsIs"))
v
}
setNamespace <- function(node, namespace = defaultNamespace) {
# if there is no namespace or if we have one and no names on the namespace
if(length(namespace) == 0 || ! ( length(namespace) == 1 && is.null(names(namespace)) ) )
return(NULL)
if(is.list(namespace))
return(NULL)
if(!is.na(match(namespace, names(namespaces))) && is.na(match(namespace, names(definedNamespaces)))) {
ns <- .Call("R_xmlNewNs", node, namespaces[[namespace]], namespace, PACKAGE = "XML")
definedNamespaces[[namespace]] <<- ns
}
setXMLNamespace(node, definedNamespaces[[namespace]])
#old setInternalNamespace( node, definedNamespaces[[namespace]])
}
# namespace is intended to be the namespace for this node
# and not any definitions.
# How do we define new namespaces with this function?
# Can we add them to attrs. No!
addTag <- function(name, ..., attrs = NULL,
close = TRUE, namespace = defaultNamespace, .children = list(...) )
{
if(inherits(name, "XMLInternalNode")) {
addChildren(currentNodes[[1]], name)
currentNodes <<- c(node, currentNodes)
addChildren(node, kids = .children)
if(close)
currentNodes <<- currentNodes[-1]
return(name)
}
# if the user gives us something like "r" for the namespace as opposed to
# c(r = "http:...") then we try to match the prefix in an earlier node
# ??? Should we use the defined namespaces in the document?
if(FALSE) {
if(length(namespace) == 1 && length(names(namespace)) == 0) {
tmp = namespace
if(length(currentNodes)) {
defs = namespaceDeclarations(currentNodes[[1]], TRUE)
i = match(namespace, names(defs))
if(!is.na(i))
namespace = defs[[i]]
}
}
}
if(!is.null(attrs))
storage.mode(attrs) <- "character"
if(inherits(name, "XMLInternalNode"))
node = name
else {
parent = if(length(currentNodes) > 1)
currentNodes[[1]]
else
xmlRoot(currentNodes[[1]])
node <- newXMLNode(name, attrs = attrs, namespace = namespace,
doc = doc, parent = parent,
namespaceDefinitions = if(addNamespaceDefinitions) namespaces else NULL)
if(addNamespaceDefinitions) {
# lapply(seq(along = namespaces),
# function(i)
# setXMLNamespace(node, namespaces[[i]], names(namespaces)[i]))
addNamespaceDefinitions <<- FALSE
}
}
# if(length(currentNodes) > 1)
# addChildren(currentNodes[[1]], node)
currentNodes <<- c(node, currentNodes)
# if(!inherits(name, "XMLInternalNode"))
# setNamespace(node, namespace)
for(i in .children)
addChildren(node, asXMLNode(i)) # vectorize XXX
if(close == TRUE)
closeTag()
invisible(node)
}
closeTag <- function(name="") {
if(nargs() == 0) {
tmp <- currentNodes[[1]]
currentNodes <<- currentNodes[-1]
} else if( is.character(name) ) {
w = sapply(currentNodes, inherits, "XMLInternalElementNode")
useNamespace = length(grep(":", name)) > 0
ids = sapply(currentNodes[ w ], xmlName, useNamespace)
tmp = list()
for(id in name) {
i = which(id == ids)
if(length(i) == 0)
stop("Cannot close tag for node with name ", id, " - no such node open")
tmp = c(tmp, currentNodes[1:i])
currentNodes <<- currentNodes[-c(1:i)]
ids = ids[-(1:i)]
}
} else if(inherits(name, "numeric")) {
num = name
if(is.na(num) || num == -1)
# close all of the nodes, except the document node.
w = seq(along = currentNodes[- length(currentNodes)])
else if(length(num) == 1)
w = 1:num
else
w = num
tmp = currentNodes[ w ]
currentNodes <<- currentNodes[ - w ]
}
invisible(tmp)
}
add = function(node, parent = currentNodes[[1]], close = TRUE) {
if(!is.null(parent)) {
addChildren(parent, node)
if(!close)
currentNodes <<- c(node, currentNodes)
}
invisible(node)
}
addComment <- function(...) {
add(newXMLCommentNode(paste(as.character(list(...)), sep=""), doc = doc))
}
addCData <- function(text) {
add(newXMLCDataNode(text, doc = doc))
}
addPI <- function(name, text) {
add(newXMLPINode(name, text, doc = doc), NULL)
}
# deal with the top-level node the user may have supplied.
if(!is.null(tag)) {
if(is.character(tag)) {
node = addTag(tag, attrs = attrs, namespace = namespaces, close = FALSE)
} else if(inherits(tag, "XMLInternalNode")) {
if(is.null(xmlParent(node))) # if we have a DTD node, need to add it to that or parallel to that?
addChildren(doc, node)
}
}
v <- list(
addTag = addTag,
addNode = addTag,
addCData = addCData,
addPI = addPI,
closeTag = closeTag,
closeNode = closeTag,
addComment = addComment,
setNamespace = setActiveNamespace,
value = function() doc,
doc = function() doc,
add = function(...){}
)
#class(v) <- c("XMLInternalDOM", "XMLOutputStream")
# v
ans = new("XMLInternalDOM", v)
names(ans) = names(v)
ans
}
setAs("XMLInternalNode", "XMLNode",
function(from)
asRXMLNode(from)
)
xmlRoot.XMLInternalDOM =
function(x, skip = TRUE, ...)
{
xmlRoot(x$doc(), skip = skip)
}
#??? This was XMLInternalElement and not ...Node
xmlRoot.XMLInternalElement = xmlRoot.XMLInternalNode =
function(x, skip = TRUE, ...)
{
doc = as(x, "XMLInternalDocument")
if(is.null(doc))
getRootNode(x) # skip = skip - getRootNode doesn't have a skip argument
else
xmlRoot(doc, skip = skip)
}
# Get the name of the file/URI for the document.
setGeneric("docName", function(doc, ...) standardGeneric("docName"))
setMethod("docName", "NULL",
function(doc, ...)
as.character(NA)
)
setMethod("docName", "XMLNode",
function(doc, ...)
as.character(NA)
)
setMethod("docName", "XMLHashTreeNode",
function(doc, ...)
docName(doc$env, ...)
)
docName.XMLInternalDocument =
function(doc, ...)
{
.Call("RS_XML_getDocumentName", doc, PACKAGE = "XML")
}
setMethod("docName", "XMLInternalDocument", docName.XMLInternalDocument)
docName.XMLInternalNode =
function(doc, ...)
{
docName(as(doc, "XMLInternalDocument"))
}
setMethod("docName", "XMLInternalNode", docName.XMLInternalNode)
docName.XMLDocument =
function(doc, ...)
{
doc$doc$file
}
setMethod("docName", "XMLDocument", docName.XMLDocument)
docName.XMLDocumentContent =
function(doc, ...)
{
doc$file
}
setOldClass("XMLDocumentContent")
setMethod("docName", "XMLDocumentContent", docName.XMLDocumentContent)
setGeneric("docName<-", function(x, value)
standardGeneric("docName<-"))
setMethod("docName<-", "XMLInternalDocument",
function(x, value)
{
.Call("RS_XML_setDocumentName", x, value, PACKAGE = "XML")
x
})
# See hashTree.R
setMethod("docName<-", "XMLHashTree",
function(x, value)
{
assign(".doc", value, x)
x
})
parseXMLAndAdd =
function(txt, parent = NULL, top = "tmp", nsDefs = character())
{
txt = paste(txt, collapse = "")
if(!inherits(txt, "AsIs") && length(top) > 0) {
open = sprintf("%s%s", top,
paste(sprintf(' xmlns%s%s="%s"', ifelse(names(nsDefs) != "", ":", ""),
names(nsDefs),
nsDefs),
collapse = ""))
tmp = sprintf("<%s>%s%s>", open, txt, top)
} else
tmp = txt
doc = xmlParse(tmp, asText = TRUE)
if(!is.null(parent))
invisible(.Call("R_insertXMLNode", xmlChildren(xmlRoot(doc)), parent, -1L, FALSE, PACKAGE = "XML"))
else
xmlRoot(doc)
}
XML/R/xmlRoot.R 0000644 0001751 0000144 00000001212 13607633705 012726 0 ustar hornik users setGeneric("xmlRoot<-",
function(x, ..., value)
standardGeneric("xmlRoot<-"))
setMethod("xmlRoot<-", c("XMLInternalDocument", value = "character"),
function(x, ..., value)
{
newXMLNode(value, doc = x)
x
})
setMethod("xmlRoot<-", c("XMLInternalDocument", value = "XMLInternalNode"),
function(x, ..., value)
{
#XXX check that this does the reference counting correctly
# specifically, d = newXMLDoc(); xmlRoot(d) = "bar"; xmlRoot(d) = newXMLNode("foo")
.Call("RS_XML_setRootNode", x, value, PACKAGE = "XML")
x
})
setMethod("xmlRoot<-", "XMLHashTree",
function(x, ..., value)
{
x$.addNode(value)
x
})
XML/R/htmlLists.R 0000644 0001751 0000144 00000003320 13607633665 013254 0 ustar hornik users setGeneric("readHTMLList",
function(doc,
trim = TRUE, elFun = xmlValue,
which = integer(), ...)
standardGeneric("readHTMLList"))
setMethod("readHTMLList",
"character",
function(doc,
trim = TRUE, elFun = xmlValue,
which = integer(), encoding = character(), ...) {
readHTMLList(htmlParse(doc, encoding = encoding), trim, elFun, which, ...)
})
setMethod("readHTMLList",
"HTMLInternalDocument",
function(doc,
trim = TRUE, elFun = xmlValue,
which = integer(), ...) {
lists = getNodeSet(doc, "//ol | //ul | //dl")
if(length(which))
lists = lists[which]
ans = lapply(lists, readHTMLList, trim = trim, elFun = elFun)
if(length(which) == 1)
ans[[1]]
else
ans
})
setMethod("readHTMLList",
"XMLInternalNode",
function(doc,
trim = TRUE, elFun = xmlValue,
which = integer(), ...) {
if(xmlName(doc) == "dl")
return(readHTMLDefinitionList(doc, trim, elFun))
ans = unname(sapply(xmlChildren(doc)[!xmlSApply(doc, is, "XMLInternalTextNode")], elFun))
if(trim)
ans = unname(sapply(ans, function(x) if(is.character(x)) trim(x) else x))
ans
})
readHTMLDefinitionList =
function(node, trim = TRUE, elFun = xmlValue)
{
kids = xmlChildren(node)
structure(sapply(kids[names(node) == "dd"], elFun),
names = sapply(kids[names(node) == "dt"], elFun))
}
XML/R/nodeAccessors.R 0000644 0001751 0000144 00000023266 14405636156 014072 0 ustar hornik users if(!exists("Sys.setenv", baseenv()))
Sys.setenv <- get("Sys.putenv", "package:base")
xmlRoot <-
function(x, skip = TRUE, ...)
{
UseMethod("xmlRoot")
}
xmlRoot.XMLDocument <-
function(x, skip = TRUE,...)
{
# x$children[[1]]
# x$doc
xmlRoot(x$doc, skip = skip,...)
}
xmlRoot.XMLDocumentContent <-
function(x, skip = TRUE, ...)
{
args <- list(...)
a <- x$children[[1]]
if(skip & inherits(a, "XMLCommentNode")) {
which <- sapply(x$children, function(x) !inherits(x, "XMLCommentNode"))
if(any(which)) {
which <- (1:length(x$children))[which]
a <- x$children[[which[1]]]
}
}
a
}
xmlRoot.HTMLDocument <-
function(x, skip = TRUE, ...)
{
x$children[[1]]
}
xmlApply <-
function(X, FUN, ...)
{
UseMethod("xmlApply")
}
xmlSApply <-
function(X, FUN, ...)
{
UseMethod("xmlSApply")
}
xmlApply.XMLNode <-
function(X, FUN, ...) {
lapply(xmlChildren(X), FUN, ...)
}
xmlApply.XMLDocument <-
function(X, FUN, ...)
{
xmlApply(xmlRoot(X), FUN, ...)
}
xmlSApply.XMLDocument <-
function(X, FUN, ...)
{
xmlSApply(xmlRoot(X), FUN, ...)
}
xmlSApply.XMLNode <-
function(X, FUN, ...) {
sapply(xmlChildren(X), FUN, ...)
}
xmlApply.XMLDocumentContent <-
function(X, FUN, ...)
{
xmlSApply(X$children, FUN, ...)
}
xmlSApply.XMLDocumentContent <-
function(X, FUN, ...)
{
xmlSApply(X$children, FUN, ...)
}
xmlValue <-
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE)
{
UseMethod("xmlValue")
}
if(useS4)
setGeneric("xmlValue", function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x))
standardGeneric("xmlValue"))
xmlValue.XMLNode <-
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE)
{
if(recursive && xmlSize(x) > 0) {
kids = xmlChildren(x)
if(ignoreComments)
kids = kids[ !sapply(kids, "XMLCommentNode") ]
return(paste(unlist(lapply(kids, xmlValue, ignoreComments, trim = trim)), collapse = ""))
} else if(!recursive && xmlSize(x) > 0) {
#XXX If !recursive but have text nodes e.g. in the second child.
i = sapply(xmlChildren(x), inherits, "XMLTextNode")
if(any(i))
return(paste(unlist(lapply(xmlChildren(x)[i], xmlValue, ignoreComments, trim = trim)), collapse = ""))
}
# if(xmlSize(x) == 1) # && (inherits(x[[1]], "XMLTextNode"))
# return(xmlValue(x[[1]], ignoreComments))
if(is.null(x$value))
character()
else
if(trim) trim(x$value) else x$value
}
setS3Method("xmlValue", "XMLNode")
xmlValue.XMLTextNode <-
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE)
{
if(!is.null(x$value))
if(trim) trim(x$value) else x$value
else
character(0)
}
setS3Method("xmlValue", "XMLTextNode")
xmlValue.XMLComment <- xmlValue.XMLCommentNode <-
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE)
{
if(ignoreComments)
return("")
if(!is.null(x$value))
if(trim) trim(x$value) else x$value
else
character(0)
}
setS3Method("xmlValue", "XMLCommentNode")
xmlValue.XMLCDataNode <-
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE)
{
if(trim) trim(x$value) else x$value
}
setS3Method("xmlValue", "XMLCDataNode")
xmlValue.XMLProcessingInstruction <-
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE)
{
if(trim) trim(x$value) else x$value
}
setS3Method("xmlValue", "XMLProcessingInstruction")
xmlValue.list = xmlValue.XMLNodeSet =
function (x, ignoreComments = FALSE, recursive = TRUE, encoding = if(length(x)) getEncoding(x[[1]]) else "",
trim = FALSE)
{
sapply(x, xmlValue, recursive = recursive, encoding = encoding, trim = trim)
}
setS3Method("xmlValue", "XMLNodeSet")
"xmlValue.NULL" =
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE)
as.character(NA)
#setS3Method("xmlValue", "NULL")
getSibling.XMLInternalNode =
# Access the next field in the xmlNodePtr object.
# not exported.
function(node, after = TRUE, addFinalizer = NA, ...)
{
if(!inherits(node, "XMLInternalNode"))
stop("can only operate on an internal node")
.Call("RS_XML_getNextSibling", node, as.logical(after), addFinalizer, PACKAGE = "XML")
}
xmlNamespaceDefinitions <-
function(x, addNames = TRUE, recursive = FALSE, simplify = FALSE, ...)
{
UseMethod("xmlNamespaceDefinitions")
}
xmlNamespaces = xmlNamespaceDefinitions
xmlNamespaceDefinitions.XMLInternalDocument =
function(x, addNames = TRUE, recursive = FALSE, simplify = FALSE, ...)
{
r = xmlRoot(x, addFinalizer = FALSE)
while(!is.null(r) && !inherits(r, "XMLInternalElementNode"))
r = getSibling(r, addFinalizer = FALSE)
if(is.null(r))
return(if(simplify) character() else NULL)
xmlNamespaceDefinitions(r, addNames, recursive, simplify)
}
xmlNamespaceDefinitions.XMLNode =
function(x, addNames = TRUE, recursive = FALSE, simplify = FALSE, ...) {
ans = unclass(x)$namespaceDefinitions
if(recursive == TRUE) {
# warning("recursive facility not yet implemented.")
f = function(node) {
if(!inherits(node, "XMLNode") || xmlName(node) == "")
return(FALSE)
ans <<- append.xmlNode(ans, unclass(node)$namespaceDefinitions)
xmlApply(node, f)
}
xmlApply(x, f)
}
if(addNames && length(ans) && length(names(ans)) == 0)
names(ans) = sapply(ans, function(x) x$id)
if(simplify) {
if(length(ans) == 0)
return(character())
ans = structure(sapply(ans, function(x) x$uri),
class = c("SimplifiedXMLNamespaceDefinitions", "XMLNamespaceDefinitions"))
} else if(!is.null(ans))
class(ans) = "XMLNamespaceDefinitions"
ans
}
xmlNamespaceDefinitions.XMLInternalNode =
function(x, addNames = TRUE, recursive = FALSE, simplify = FALSE, ...)
{
ans = .Call("RS_XML_internalNodeNamespaceDefinitions", x, as.logical(recursive), PACKAGE = "XML")
if(addNames && length(ans) > 0)
names(ans) = sapply(ans, function(x) x$id)
if(simplify) {
if(length(ans) == 0)
return(character(0))
ans = sapply(ans, function(x) x$uri)
ans = structure(removeDuplicateNamespaces(ans), class = c("SimplifiedXMLNamespaceDefinitions", "XMLNamespaceDefinitions"))
} else if(!is.null(ans))
class(ans) = "XMLNamespaceDefinitions"
ans
}
setGeneric("getEffectiveNamespaces",
function(node, ...)
standardGeneric("getEffectiveNamespaces"))
tmp =
function(node, ...)
{
ans = xmlNamespaceDefinitions(node)
merge = function(to, what) {
i = !(names(what) %in% names(to))
if(any(i))
ans[names(what)[i]] <<- what[i]
}
tmp = xmlParent(node, manageMemory = FALSE)
while(!is.null(tmp)) {
merge(ans, xmlNamespaceDefinitions(tmp))
tmp = xmlParent(tmp, manageMemory = FALSE)
}
ans
}
setMethod("getEffectiveNamespaces", "XMLInternalNode", tmp)
setMethod("getEffectiveNamespaces", "XMLHashTreeNode", tmp)
setMethod("getEffectiveNamespaces", "XMLNode",
function(node)
xmlNamespaceDefinitions(node))
removeDuplicateNamespaces =
function(ns)
{
dups = duplicated(names(ns))
if(!any(dups))
return(ns)
tapply(ns, names(ns),
function(els) {
if(length(els) == 1)
return(TRUE)
if(length(unique(els)) > 1)
stop("different URIs for the same name space prefix ", names(els)[1])
TRUE
})
ns[!dups]
}
xmlNamespace <-
function(x)
{
UseMethod("xmlNamespace")
}
xmlNamespace.XMLNode <-
function(x)
{
x$namespace
}
#setMethod("xmlNamespace", "character",
xmlNamespace.character =
function(x) {
a = strsplit(x, ":")[[1]]
if(length(a) == 1)
character()
else
a[1]
}
#)
verifyNamespace =
# Check that the namespace prefix in tag (if any)
# has a definition in def that matches the definition of the same prefix in node.
function(tag, def, node)
{
# could have prefix: with no name, but that should never be allowed earlier than this.
ns = strsplit(tag, ":")[[1]]
if(length(ns) == 1)
return(TRUE)
if(! (ns[1] %in% names(def)) )
return(FALSE)
defs = xmlNamespaceDefinitions(node)
if( defs[[ ns[1] ]]$uri != def[ ns[1] ])
stop("name space prefix ", ns, " does not match ", def[ ns[1] ], " but ", defs[[ ns[1] ]] $uri)
TRUE
}
xmlGetAttr <-
#Added support for name spaces.
function(node, name, default = NULL, converter = NULL, namespaceDefinition = character(),
addNamespace = length(grep(":", name)) > 0)
{
a <- xmlAttrs(node, addNamespace)
if(is.null(a) || is.na(match(name, names(a))))
return(default)
if(length(namespaceDefinition))
verifyNamespace(name, namespaceDefinition, node)
if(!is.null(converter))
converter(a[[name]])
else
a[[name]]
}
getXInclude =
function(node, parse = FALSE, sourceDoc = NULL)
{
href = xmlGetAttr(node, "href")
xpointer = xmlGetAttr(node, "xpointer")
if(parse) {
#
# Perhaps just reload the original document
# and see what the difference is. Not guaranteed
# to work since people may have already altered
# the source document.
if(!is.na(href)) {
fileName = paste(dirname(docName(sourceDoc)), href, sep = .Platform$file.sep)
doc = xmlParse(fileName)
} else
doc = sourceDoc
if(!is.na(xpointer)) {
}
} else
c(href = href, xpointer = xpointer)
}
getInclude =
#
#XXX getXIncludeInfo is not defined!
#
function(doc, parse = FALSE)
{
xpathApply(doc, "//xi:include", getXIncludeInfo, parse, docName(doc), doc,
namespaces = c(xi="http://www.w3.org/2001/XInclude"))
}
getXIncludeInfo =
function(node, parse = FALSE, baseURL = character(), doc = NULL)
{
}
XML/R/xmlOutputDOM.R 0000644 0001751 0000144 00000007405 13607633674 013662 0 ustar hornik users setClass("XMLOutputStream", "namedList")
setClass("XMLOutputDOM", contains = "XMLOutputStream")
setClass("XMLOutputBuffer", contains = "XMLOutputStream")
setClass("XMLInternalDOM", contains = "XMLOutputStream")
xmlOutputDOM <-
function(tag = "doc", attrs = NULL, dtd = NULL, nameSpace = NULL, nsURI = character(0),
xmlDeclaration = NULL)
{
buf <- NULL
current <- NULL
startingNode = 1
if(is.logical(xmlDeclaration) && xmlDeclaration)
xmlDeclaration = xmlPINode("xml", 'version = "1.0"')
else if(is.character(xmlDeclaration)) {
if(length(grep('version *=', xmlDeclaration)) == 0)
xmlDeclaration = paste(xmlDeclaration, "version='1.0'")
xmlDeclaration = xmlPINode("xml", xmlDeclaration)
}
if(length(dtd))
dtd = paste(" 1) paste("PUBLIC", ddQuote(dtd[2])), ">")
reset <-
function() {
buf <<- xmlNode(tag, attrs = attrs, namespace = nameSpace)
if(length(nsURI) > 0) {
names(nsURI) <- paste("xmlns", names(nsURI), sep=":")
buf$attributes <<- nsURI
}
current <<- integer(0)
invisible(buf)
}
reset()
addTag <-
function(tag, ..., attrs=NULL, close=TRUE, namespace=NULL, .children = list(...)) {
if(missing(namespace))
namespace <- nameSpace
addNode(n <- xmlNode(tag, attrs= attrs, namespace = namespace, .children = .children))
if(close == FALSE) {
current <<- c(current, xmlSize(getCurrent()))
}
invisible(n)
}
getCurrentExpr <-
function() {
if(length(current) > 0) {
p <- seq(2, length=length(current)-1)
kall <- call("[[", as.name("buf"), current[1])
for(i in p) {
kall <- call("[[", kall, current[i])
}
} else
kall <- as.name("buf")
kall
}
getCurrent <- function() {
eval(getCurrentExpr())
}
# We want to append this to the currently open (or active)
# node as defined by `current'
# d[[1]] <- append.xmlNode(d[[1]], xmlNode("phone"))
addNode <-
function(node) {
kall <- getCurrentExpr()
if(length(current) > 0){
lhs <- kall
kall <- call("append.xmlNode", kall, node)
kall <- call("<<-", lhs, kall)
} else {
kall <- call("append.xmlNode", kall, node)
}
val <- eval(kall)
if(length(current) == 0)
buf <<- val
invisible(node)
}
addComment <- function(...) {
addNode(xmlCommentNode(paste(sapply(list(...), as.character), sep="")))
}
addCData <- function(text) {
addNode(xmlCDataNode(text))
}
addPI <- function(name, text) {
addNode(xmlPINode(name, text))
}
addText <- function(text, namespace = "") {
addNode(xmlTextNode(text, namespace))
}
closeTag <-
function(name="", namespace=NULL) {
# namespace is ignored since we already have the tag name!
current <<- current[-length(current)]
}
getValue =
function() {
# Add DOCTYPE if we have a dtd.
if(!is.null(xmlDeclaration))
structure(list(xmlDeclaration = xmlDeclaration, root = buf, doctype = dtd), class = "XMLRDocument")
else
buf
}
con <- list( value= getValue,
addTag = addTag,
addEndTag = function(name){ closeTag(name)},
closeTag = closeTag,
reset = reset,
addNode = addNode,
add = function(...) {},
addComment = addComment,
addPI = addPI,
addCData = addCData,
current = function(){current}
)
#class(con) <- c("XMLOutputDOM", "XMLOutputStream")
# con
ans = new("XMLOutputDOM", con)
names(ans) = names(con)
ans
}
xmlRoot.XMLRDocument =
function(x, skip = TRUE, ...)
x$root
print.XMLRDocument =
function(x, ...)
{
if(!is.null(x$xmlDeclaration))
print(x$xmlDeclaration)
if(!is.null(x$doctype))
cat(x$doctype, "\n")
print(x$root, ...)
}
XML/R/bitList.R 0000644 0001751 0000144 00000000333 13607633667 012706 0 ustar hornik users bitlist =
# Taken from RAutoGenRuntime
function(...)
{
bitOr = bitops::bitOr
x = unlist(list(...))
if(length(x) == 1)
return(x)
ans = x[1]
for(i in 2:length(x)) {
ans = bitOr(ans, x[i])
}
ans
}
XML/R/xmlToDataFrame.R 0000644 0001751 0000144 00000016724 14205426523 014141 0 ustar hornik users
# Adapt this to be able to specify an XPath expression to identify a list of nodes.
setGeneric("xmlToDataFrame",
#
# Read a relatively flat, 2-level deep XML document into a data frame.
# The document is assumed to be something of the form
#
#
# value
# value
# value
#
#
# value
# value
# value
#
#
#
# This can handle cases where not all observations have the same
# fields.
#
# z = xmlToDataFrame("~/size.xml")
# z = xmlToDataFrame("~/size.xml", c("integer", "integer", "numeric"))
#
function(doc, colClasses = NULL, homogeneous = NA, collectNames = TRUE, nodes = list(), stringsAsFactors = FALSE)
standardGeneric("xmlToDataFrame"))
setMethod("xmlToDataFrame", "character",
# parse the XML document if it is a file name and
# not a regular XML document already.
function(doc, colClasses = NULL, homogeneous = NA, collectNames = TRUE, nodes = list(), stringsAsFactors = FALSE)
xmlToDataFrame(xmlParse(doc), colClasses, homogeneous, collectNames, stringsAsFactors = stringsAsFactors))
setMethod("xmlToDataFrame", c("XMLInternalDocument", nodes = "missing"),
function(doc, colClasses = NULL, homogeneous = NA, collectNames = TRUE, nodes = list(), stringsAsFactors = FALSE)
xmlToDataFrame(doc, colClasses, homogeneous, collectNames, nodes = xmlChildren(xmlRoot(doc)), stringsAsFactors))
tmp =
function(doc, colClasses = NULL, homogeneous = NA, collectNames = TRUE, nodes = list(), stringsAsFactors = FALSE)
{
if(length(nodes) == 0)
return(data.frame())
# Find out how many fields there.
nfields = sapply(nodes, xmlSize)
nvar = max(nfields)
if(collectNames)
varNames = unique(unlist( lapply(nodes, names) ))
else
varNames = names(nodes[[which.max(nfields)]])
if(is.na(homogeneous))
homogeneous = all(nfields == nvar) && all(sapply(nodes[-1], function(x) all(names(x) == varNames)))
if(!homogeneous)
return(fromRaggedXML2DataFrame(nodes, varNames, c(length(nfields), length(varNames)), colClasses, stringsAsFactors))
# Function to operate on each
fun = function(x) {
tmp = xmlSApply(x, xmlValue)
length(tmp) = nvar
tmp
}
# Get the individual values
vals = unlist(lapply(nodes, fun))
ans = matrix(vals, length(nfields), byrow = TRUE)
ans =
if(length(colClasses)) {
as.data.frame(lapply(seq(along = colClasses),
function(i) {
as(ans[, i], colClasses[i])
}), stringsAsFactors = stringsAsFactors)
} else
as.data.frame(ans, stringsAsFactors = stringsAsFactors)
names(ans) = varNames
ans
}
bob =
function(doc, colClasses = NULL, homogeneous = NA, collectNames = TRUE, nodes = list(), stringsAsFactors = FALSE)
xmlToDataFrame(nodes = doc, colClasses = colClasses, homogeneous = homogeneous, collectNames = collectNames, stringsAsFactors = stringsAsFactors)
setMethod("xmlToDataFrame", c(nodes = "XMLNodeSet"), tmp)
setMethod("xmlToDataFrame", c(nodes = "list"), tmp)
setOldClass("XMLInternalNodeList")
setMethod("xmlToDataFrame", c(nodes = "XMLInternalNodeList"), tmp)
setMethod("xmlToDataFrame", "XMLNodeSet", bob)
setMethod("xmlToDataFrame", "XMLInternalNodeList", bob)
setMethod("xmlToDataFrame", "list", bob)
setMethod("xmlToDataFrame", "XMLInternalElementNode",
function(doc, colClasses = NULL, homogeneous = NA, collectNames = TRUE, nodes = list(), stringsAsFactors = FALSE)
xmlToDataFrame(nodes = xmlChildren(doc), colClasses = colClasses, homogeneous = homogeneous, collectNames = collectNames, stringsAsFactors = stringsAsFactors))
fromRaggedXML2DataFrame =
#
# This reads data from the nodes of an XML document and assumes
# that they do not all have the same number or even names of fields.
# So this does extra work to match each observation to the union of
# the field names across all nodes.
#
# o = fromRaggedXML2DataFrame("size2.xml")
# o = fromRaggedXML2DataFrame("size1.xml")
#
function(nodes, varNames = unique(unlist( lapply(nodes, names) )),
dims = c(length(nodes), length(varNames)), colClasses = NULL,
stringsAsFactors = FALSE)
{
#XXX
if(is.character(nodes))
nodes = xmlChildren(xmlRoot(xmlParse(nodes)))
# create an empty data frame with as many rows and columns as needed.
ans = as.data.frame(replicate(dims[2], rep(as.character(NA), dims[1]), simplify = FALSE), stringsAsFactors = FALSE)
names(ans) = varNames
# Fill in the rows based on the names.
for(i in seq(length = dims[1]))
ans[i, names(nodes[[i]])] = xmlSApply(nodes[[i]], xmlValue)
# Convert the columns to the specified classes if specified.
# Should drop cols with NULL. Also guess those with NA.
if(length(colClasses)) {
i = ! sapply(colClasses, is.null)
ans = ans[ i ]
varNames = varNames[i]
colClasses = colClasses[ i ]
ans = as.data.frame(lapply(seq(length = ncol(ans)),
function(i) {
as(ans[, i], colClasses[[i]])
}), stringsAsFactors = stringsAsFactors)
}
names(ans) = varNames
ans
}
setGeneric("xmlAttrsToDataFrame",
function(doc, attrs = character(), omit = character(), ...)
standardGeneric("xmlAttrsToDataFrame"))
setMethod("xmlAttrsToDataFrame", "character",
function(doc, attrs = character(), omit = character(), ...)
xmlAttrsToDataFrame(xmlParse(doc), attrs, omit, ...))
setMethod("xmlAttrsToDataFrame", "AsIs",
function(doc, attrs = character(), omit = character(), ...)
xmlAttrsToDataFrame(xmlParse(doc), attrs, omit, ...))
setMethod("xmlAttrsToDataFrame", "XMLInternalElementNode",
function(doc, attrs = character(), omit = character(), ...)
xmlAttrsToDataFrame(xmlChildren(doc), attrs, omit, ...))
setMethod("xmlAttrsToDataFrame", "XMLNodeSet",
function(doc, attrs = character(), omit = character(), ...) {
xmlAttrsToDataFrame(as(doc, 'list'), attrs, omit, ...)
})
setMethod("xmlAttrsToDataFrame", "list",
function(doc, attrs = character(), omit = character(), ...) {
# assuming these are all nodes.
combineNamedVectors(lapply(doc, xmlAttrs), attrs, omit, ...)
})
setMethod("xmlAttrsToDataFrame", "XMLInternalNodeList",
function(doc, attrs = character(), omit = character(), ...) {
# assuming these are all nodes.
combineNamedVectors(lapply(doc, xmlAttrs), attrs, omit, ...)
})
inAllRecords =
function(x)
{
tt = table(unlist(lapply(x, names)))
names(tt)[ tt == length(x)]
}
allNames =
function(x)
unique( unlist(lapply(x, names)) )
combineNamedVectors =
function(els, attrs = character(), omit = character(), ...)
{
if(is.function(attrs))
attrs = attrs(els)
if(!length(attrs)) {
attrs = allNames(els)
if(length(omit))
attrs = setdiff(attrs, omit)
}
if(length(attrs) == 0) {
warning("no elements to combine across records")
return(data.frame())
}
values = lapply(els, function(x) {
structure(x[attrs], names = attrs)
})
ans = as.data.frame(do.call(rbind, values), row.names = NULL, ...)
rownames(ans) = NULL
ans
}
XML/R/toString.R 0000644 0001751 0000144 00000000414 13610046416 013065 0 ustar hornik users toString.XMLNode <-
function(x, ...)
{
.tempXMLOutput = "" # Put here for codetools to detect the variable locally.
con <- textConnection(".tempXMLOutput", "w", local = TRUE)
sink(con)
print(x)
sink()
close(con)
paste(.tempXMLOutput, collapse="\n")
}
XML/R/namespaceHandlers.R 0000644 0001751 0000144 00000006100 13607633666 014706 0 ustar hornik users namespaceNodeHandlers =
#
# This is to manage a collection of node handlers
# which have handlers for nodes with the same name but in an
# different namespace.
# For example, suppose we have a node array in an R representation
# and another in Matlab but these are differentiated by the namespace.
# r:array and m:array.
# This function arranges to invoke the correct handler when a node is encountered.
#
# namespaceNodeHandlers("r:array" = function(...) ...,
# "m:array" = function(...) ...,
# other = function(...) ...)
#
#
# namespaceNodeHandlers("r:array" = function(...) ...,
# "m:array" = function(...) ...,
# other = function(...) ...,
# nsDefs= c(r = "http://www.r-project.org",
# m = "http://www.mathworks.com") )
#
#
# If there is no handler for a given node, then call the default one
# i.e. .startElement or startElement
#
function(..., .handlers = list(...), nsDefs = NULL, useDotNames = TRUE)
{
# Get the node name and namespace prefix/alias for each of the handler names
tmp = strsplit(names(.handlers), ":")
prefix = sapply(tmp, function(x) if(length(x) > 1) x[1] else "")
nodeNames = sapply(tmp, function(x) if(length(x) > 1) x[2] else x[1])
# Now, find out which ones are duplicated.
w = duplicated(nodeNames)
if(!any(w))
return(.handlers)
dups = nodeNames[w]
# Now, take out the handler functions that have the same node name
# as any other.
w = nodeNames %in% dups
nsHandlers = .handlers[ w ]
# and remove them from .handlers
.handlers = .handlers[ !w ]
# This function will act as the proxy for doing the dispatch for a particular node.
generalHandler =
function(node, ...) {
# Get the node name and the namespace prefix.
id = xmlName(node)
ns = xmlNamespace(node)
if(is.null(ns))
ns = ''
if(length(nsDefs)) {
# get the namespace definition from the node
# and its URI and then match this to the nsDefs
# That gives us the prefix to use
i = (ns == nsDefs)
if(!any(i) && ns != '')
ns = character() #stop("can't match namespace '", as.character(ns), "' in ", paste(nsDefs, collapse = ", "))
else
ns = names(nsDefs)[i]
}
if(length(ns) && ns != "")
tmp = paste(ns, id, sep = ":")
else
tmp = id
f = nsHandlers[[ tmp ]]
# if we didn't find a handler, use the startElement one
if(is.null(f))
f = .handlers[[ if(useDotNames) '.startElement' else 'startElement' ]]
# if we have a handler, call it.
# Otherwise, just return the node... after all that!
if(!is.null(f))
f(node, ...)
else
node
}
.handlers[ dups ] = rep(list(generalHandler), length(dups))
class(.handlers) <- "XMLNamespaceNodeHandlers"
if(length(nsDefs))
class(.handlers) <- c(class(.handlers), "RequiresNamespaceInfo")
.handlers
}
XML/R/serialize.R 0000644 0001751 0000144 00000001354 13607633670 013261 0 ustar hornik users xmlSerializeHook =
function(x)
{
if(inherits(x, c("XMLInternalDocument", "XMLInternalElementNode")))
c(as(x, "character"), class(x)[1])
else if(is(x, "XMLNodeSet"))
c(sapply(x, as, "character"), class(x)[1])
else
NULL
}
xmlDeserializeHook =
function(x)
{
if(length(x) == 2) {
if(x[2] == "XMLInternalElementNode")
xmlRoot(xmlParse(I(x[1])))
else if(x[2] == "XMLNodeSet") {
#XXX we should put these into the same document, but it is hard to make this sensible.
structure(lapply(x, function(x) xmlRoot(xmlParse(I(x)))), "XMLNodeSet")
} else if(x[2] == "XMLInternalDocument")
xmlParse(I(x[1]))
else
stop("Not sure how to handle ", x[2])
} else
xmlParse(I(x))
}
XML/R/XMLClasses.R 0000644 0001751 0000144 00000077233 14405636156 013260 0 ustar hornik users #
# This file contains the definitions of methods
# for operating on the XMLNode objects to make
# the more user-friendly. Specifically, these
# methods are
# print displays the contents of a node and children
# as XML text rather than R/S list
#
# size returns the number of children
#
# name retrieves the tag name
#
# attrs retrieves the attributes element of the XML node
#
# [ and [[ access the children
# (To get at the regular R/S fields in the object, use $
# e.g. node$name, node$attributes, node$value)
#
# In S4/Splus5, we should use the new class mechanism.
#
setS3Method =
function(fun, class) {
if(!useS4)
return()
cat("setting method for", fun, class, "\n")
setMethod(fun, class, get(paste(fun, class, sep = ".")), where = topenv(parent.frame()))
}
if(FALSE)
setOldClass =
function(classes)
{
ancestors = unique(sapply(classes[-1], oldClass))
if(length(ancestors)) {
classes = c(classes[1], ancestors)
oldClassTable[[ classes[1] ]] <<- ancestors
}
methods::setOldClass(classes)
}
# For R 2.7.2 and older. In 2.8.0, extends() for setOldClass() works
# better.
oldClassTable = list(
"XMLNode" = c("RXMLAbstractNode", "XMLAbstractNode"),
"XMLTextNode" = c("XMLNode", "RXMLAbstractNode", "XMLAbstractNode"),
"XMLPINode" = c( "XMLNode", "RXMLAbstractNode", "XMLAbstractNode") ,
"XMLProcessingInstruction" = c( "XMLNode", "RXMLAbstractNode", "XMLAbstractNode") ,
"XMLCommentNode" = c("XMLNode", "XMLTextNode", "RXMLAbstractNode", "XMLAbstractNode"),
"XMLCDataNode" = c("XMLNode", "RXMLAbstractNode", "XMLAbstractNode"),
"XMLHashTree" = c("XMLAbstractDocument"),
"XMLHashTreeNode" = c("RXMLAbstractNode"),
"XMLDocumentContent" = c(),
"XMLDocument" = c("XMLAbstractDocument"),
"XMLHashTree" = c("XMLAbstractDocument"),
"XMLInternalDocument" = c("XMLAbstractDocument"),
"HTMLInternalDocument" = c("XMLInternalDocument", "XMLAbstractDocument"),
"XMLTreeNode" = c("RXMLAbstractNode")
)
oldClass =
function(class)
{
if(version$major == "2" && as.integer(version$minor) >= 8)
return(unique(c(class, extends(class))))
c(class, oldClassTable[[ class ]])
}
###############################
# These were in xmlNodes, but need to be defined earlier.
setOldClass("XMLAbstractDocument")
setOldClass(c("XMLInternalDocument", "XMLAbstractDocument"))
setOldClass(c("XMLHashTree", "XMLAbstractDocument"))
setOldClass(c("XMLDocument", "XMLAbstractDocument"))
#XXXsetOldClass(c("HTMLInternalDocument", "XMLInternalDocument")) # , "XMLAbstractDocument"))
setOldClass(c("HTMLInternalDocument", "XMLInternalDocument", "XMLAbstractDocument"))
setOldClass("XMLAbstractNode")
setOldClass(c("RXMLAbstractNode", "XMLAbstractNode"))
# Why do we have to repeat this class inheritance information?
# We don't!
# setOldClass(c("XMLHashTreeNode", "RXMLAbstractNode", "XMLAbstractNode"))
# setOldClass(c("XMLNode", "RXMLAbstractNode", "XMLAbstractNode"))
# setOldClass(c("XMLTextNode", "XMLNode", "RXMLAbstractNode", "XMLAbstractNode"))
# setOldClass(c("XMLPINode", "XMLNode", "RXMLAbstractNode", "XMLAbstractNode"))
# setOldClass(c("XMLCommentNode", "XMLNode", "XMLTextNode", "RXMLAbstractNode", "XMLAbstractNode"))
# setOldClass(c("XMLProcessingInstruction", "XMLNode", "RXMLAbstractNode", "XMLAbstractNode"))
# setOldClass(c("XMLCDataNode", "XMLNode", "RXMLAbstractNode", "XMLAbstractNode"))
setOldClass(c("XMLHashTreeNode", "RXMLAbstractNode"))
setOldClass(c("XMLNode", "RXMLAbstractNode"))
###setOldClass(c("XMLTextNode", "XMLNode"))
methods::setOldClass(c("XMLTextNode", "XMLTextNode", "XMLNode", "RXMLAbstractNode", "XMLAbstractNode" ))
#setOldClass(c("XMLEntitiesEscapedTextNode", "XMLTextNode", "XMLNode", "RXMLAbstractNode", "XMLAbstractNode"))
#setOldClass(c("XMLEntitiesEscapedTextNode", "XMLTextNode"))
setOldClass(c("XMLPINode", "XMLNode"))
setOldClass(c("XMLCommentNode", "XMLNode"))
setOldClass(c("XMLProcessingInstruction", "XMLNode"))
setOldClass(c("XMLCDataNode", "XMLNode"))
setOldClass(c("XMLTreeNode", "XMLNode", "RXMLAbstractNode", "XMLAbstractNode" ))
setOldClass(c("XMLInternalNode", "XMLAbstractNode"))
setOldClass(c("XMLInternalCDataNode", "XMLInternalNode"))
setOldClass(c("XMLInternalPINode", "XMLInternalNode"))
setOldClass(c("XMLInternalCommentNode", "XMLInternalNode"))
setOldClass(c("XMLInternalElementNode", "XMLInternalNode"))
setOldClass(c("XMLInternalTextNode", "XMLInternalNode"))
setOldClass(c("XMLXIncludeStartNode", "XMLInternalNode"))
setOldClass(c("XMLXIncludeEndNode", "XMLInternalNode"))
setOldClass(c("XMLEntityDeclNode", "XMLInternalNode"))
setOldClass(c("XMLAttributeDeclNode", "XMLInternalNode"))
setOldClass(c("XMLDocumentNode", "XMLInternalNode"))
setOldClass(c("XMLDocumentTypeNode", "XMLInternalNode"))
setOldClass(c("XMLDocumentFragNode", "XMLInternalNode"))
setOldClass(c("XMLNamespaceDeclNode", "XMLInternalNode"))
setOldClass(c("XMLAttributeNode", "XMLInternalNode"))
setOldClass(c("XMLDTDNode", "XMLInternalNode"))
setOldClass("XMLNamespace")
setOldClass("XMLNamespaceDefinition")
setOldClass("XMLNamespaceDefinitions")
#setOldClass("XMLInternalDOM")
setOldClass(c("SimplifiedXMLNamespaceDefinitions", "XMLNamespaceDefinitions"))
#setClass("XPathNodeSet", representation(ref = "externalptr"))
setAs("XMLDocument", "XMLInternalDocument",
function(from) {
xmlParse(saveXML(from$doc$children$doc))
})
###################
############
#setMethod("[[", c("XMLInternalElementNode", "numeric") ,
"[[.XMLInternalElementNode" =
function(x, i, j, ..., exact = NA, namespaces = xmlNamespaceDefinitions(x, simplify = TRUE), addFinalizer = NA)
{
if(is(i, "numeric"))
.Call("R_getNodeChildByIndex", x, as.integer(i), addFinalizer, PACKAGE = "XML")
else
NextMethod()
}
xmlChildren <-
function(x, addNames = TRUE, ...)
{
UseMethod("xmlChildren")
}
setGeneric("xmlParent",
function(x, ...)
standardGeneric("xmlParent"))
xmlChildren.XMLNode <-
#
# Retrieve the list of children (sub-nodes) within
# an XMLNode object.
#
function(x, addNames = TRUE, ...)
{
structure(if(length(x$children)) x$children else list(), class = "XMLNodeList")
}
if(useS4) {
setGeneric("xmlChildren", function(x, addNames = TRUE, ...) standardGeneric("xmlChildren"))
setMethod("xmlChildren", "XMLNode", xmlChildren.XMLNode)
}
if(useS4) {
setGeneric("xmlName", function(node, full = FALSE) standardGeneric("xmlName"))
setMethod("xmlName", "XMLCommentNode", function(node, full = FALSE) "comment")
setMethod("xmlName", "XMLNode",
function(node, full = FALSE)
{
ns = unclass(node)$namespace
if(!full || is.null(ns) || ns == "")
return(node$name)
#
if(!is.character(ns)) {
tmp = ns$id
} else if(inherits(ns, "XMLNamespace"))
tmp = names(ns)
else
tmp = ns
if(length(tmp))
paste(tmp, unclass(node)$name, sep=":")
else
unclass(node)$name
})
} else {
xmlName <-
function(node, full = FALSE)
{
UseMethod("xmlName", node)
}
xmlName.XMLComment <-
function(node, full = FALSE) {
return("comment")
}
xmlName.XMLNode <-
#
# Get the XML tag name of an XMLNode object
#
function(node, full = FALSE)
{
ns = unclass(node)$namespace
if(!full || is.null(ns) || ns == "")
return(node$name)
#
if(!is.character(ns)) {
tmp = ns$id
} else if(inherits(ns, "XMLNamespace"))
tmp = names(ns)
else
tmp = ns
if(length(tmp))
paste(tmp, unclass(node)$name, sep=":")
else
unclass(node)$name
}
}
xmlAttrs <-
function(node, ...)
{
UseMethod("xmlAttrs", node)
}
xmlAttrs.XMLNode <-
#
# Get the named list of attributes
# for an XMLNode object.
#
function(node, ...)
{
node$attributes
}
if(useS4)
setMethod("xmlAttrs", "XMLNode", xmlAttrs.XMLNode)
"[.XMLNode" <-
#
# Extract the children (sub-nodes) within
# the specified object identified by ...
# and return these as a list
#
function(x, ..., all = FALSE)
{
obj <- xmlChildren(x) # x$children
if(all) # "all" %in% names(list(...)) && list(...)[["all"]] == TRUE)
structure(obj[ names(obj) %in% list(...)[[1]] ], class = "XMLNodeList")
else
structure(obj[...], class = "XMLNodeList") # NextMethod("[")
}
"[[.XMLDocumentContent" <-
function(x, ...)
{
x$children[[...]]
}
"[[.XMLNode" <-
#
# Extract the children (sub-nodes) within
# the specified object identified by ...
#
function(x, ...)
{
xmlChildren(x)[[...]]
}
names.XMLNode <-
function(x)
{
# names(xmlChildren(x))
xmlSApply(x, xmlName)
}
"names<-.XMLNode" <-
function(x, value)
{
names(x$children) <- value
x
}
length.XMLNode <-
function(x)
{
xmlSize(x)
}
xmlSize <-
#
# The number of elements within (or length of) a collection
#
function(obj)
{
UseMethod("xmlSize", obj)
}
xmlSize.XMLDocument <-
function(obj)
{
return(length(obj$doc$children))
}
xmlSize.default <-
#
# The number of elements within (or length of) a collection
#
function(obj)
{
length(obj)
}
xmlSize.XMLNode <-
#
# Determine the number of children (or sub-nodes) within an XML node.
#
function(obj)
{
length(obj$children)
}
print.XMLComment <- print.XMLCommentNode <-
function(x, ..., indent = "", tagSeparator = "\n")
{
if(is.logical(indent) && !indent)
indent <- ""
cat(indent, "", tagSeparator, sep="")
}
print.XMLTextNode <-
function(x, ..., indent = "", tagSeparator = "\n")
{
if(is.logical(indent) && !indent)
indent <- ""
if(inherits(x, "EntitiesEscaped"))
txt = xmlValue(x)
else
txt = insertEntities( xmlValue(x) )
cat(indent, txt, tagSeparator, sep="")
}
setAs("XMLNamespaceDefinitions", "character",
function(from) {
if(length(from) == 0)
return(character())
ans = structure(sapply(from, function(x) x$uri), names = sapply(from, function(x) x$id))
if(length(names(ans)) == 0)
names(ans) = ""
ans
})
setAs("character", "XMLNamespaceDefinitions",
function(from) {
ids = names(from)
if(is.null(ids))
ids = rep("", length(from))
structure(lapply(seq(along = from),
function(i)
structure(list(id = ids[[i]], uri = from[i], local = TRUE), class = "XMLNamespace")),
class = "XMLNamespaceDefinitions",
names = ids)
})
print.XMLNode <-
#
# displays a node and attributes (and its children)
# in its XML format.
#
function(x, ..., indent = "", tagSeparator = "\n")
{
if(length(xmlAttrs(x))) {
tmp <- paste(names(xmlAttrs(x)),paste("\"", insertEntities(xmlAttrs(x)), "\"", sep=""), sep="=", collapse=" ")
} else
tmp <- ""
if(length(x$namespaceDefinitions) > 0) {
k = as(x$namespaceDefinitions, "character")
ns = paste("xmlns", ifelse(nchar(names(k)), ":", ""), names(k), "=", ddQuote(k), sep = "", collapse = " ")
# ns <- paste(sapply(x$namespaceDefinitions,
# function(x) {
# paste("xmlns", if(nchar(x$id) > 0) ":" else "", x$id, "=", "\"", x$uri, "\"", sep="")
# }), collapse=" ")
} else
ns <- ""
# Add one space to the indentation level for the children.
# This will accumulate across successive levels of recursion.
subIndent <- paste(indent, " ", sep="")
if(is.logical(indent) && !indent) {
indent <- ""
subIndent <- FALSE
}
if (length(xmlChildren(x)) == 0) {
## Empty Node - so difference is
cat(indent, paste("<", xmlName(x, TRUE), ifelse(tmp != "",
" ", ""), tmp, ifelse(ns != "", " ", ""), ns, "/>", tagSeparator,
sep = ""), sep = "")
} else if (length(xmlChildren(x))==1 &&
inherits(xmlChildren(x)[[1]],"XMLTextNode")) {
## Sole child is text node, print without extra white space.
cat(indent, paste("<", xmlName(x, TRUE), ifelse(tmp != "",
" ", ""), tmp, ifelse(ns != "", " ", ""), ns, ">",
sep = ""), sep = "")
kid = xmlChildren(x)[[1]]
if(inherits(kid, "EntitiesEscaped"))
txt = xmlValue(kid)
else
txt = insertEntities( xmlValue(kid) )
cat(txt,sep="")
cat(paste("", xmlName(x, TRUE), ">", tagSeparator,
sep = ""), sep = "")
} else {
cat(indent, paste("<", xmlName(x, TRUE), ifelse(tmp != "",
" ", ""), tmp, ifelse(ns != "", " ", ""), ns, ">", tagSeparator,
sep = ""), sep = "")
for (i in xmlChildren(x))
print(i, indent = subIndent, tagSeparator = tagSeparator)
cat(indent, paste("", xmlName(x, TRUE), ">", tagSeparator,
sep = ""), sep = "")
}
}
print.XMLEntityRef <-
function(x, ..., indent="", tagSeparator = "\n")
{
if(is.logical(indent) && !indent)
indent <- ""
cat(indent, x$value)
}
print.XMLCDataNode <-
function(x, ..., indent="", tagSeparator = "\n")
{
if(is.logical(indent) && !indent)
indent <- ""
cat(indent, "", tagSeparator, sep = "")
}
print.XMLProcessingInstruction <-
function(x, ..., indent="", tagSeparator = "\n")
{
if(is.logical(indent) && !indent)
indent <- ""
cat(indent, paste("", x$name," ", x$value, "?>", tagSeparator, sep=""), sep = "")
}
xmlElementsByTagName <-
#
# Extract all the sub-nodes within an XML node
# with the tag name `name'.
#
function(el, name, recursive = FALSE)
{
kids = xmlChildren(el)
idx = (names(kids) == name)
els = kids[idx]
# idx <- (names(el$children) == name)
# els = el$children[idx]
if(!recursive || xmlSize(el) == 0)
return(els)
subs = xmlApply(el, xmlElementsByTagName, name, TRUE)
subs = unlist(subs, recursive = FALSE)
append.xmlNode(els, subs[!sapply(subs, is.null)])
}
getDefaultNamespace =
function(doc, ns = xmlNamespaceDefinitions(doc, simplify = simplify), simplify = FALSE)
{
if(length(ns) == 0)
return(character())
i = which(names(ns) == "")
if(length(i))
ns[i]
else
character()
# val = unlist(sapply(ns, function(x) if(x$id == "") x$uri))
# if(length(val))
# val[1]
# else
# character()
}
matchNamespaces =
# d = xmlTreeParse("data/namespaces.xml", useInternal = TRUE)
# "omg"
# c("ns", "omegahat", "r")
# c("ns", omegahat = "http://www.omegahat.net", "r")
# c("ns" = "http://www.omegahat.net", "omg" = "http://www.omegahat.net/XML", "r")
#
# Error because z and rs are not defined in the document.
# matchNamespaces(d, c("omg", "z", "rs"))
#
#
function(doc, namespaces,
nsDefs = xmlNamespaceDefinitions(doc, recursive = TRUE, simplify = FALSE),
defaultNs = getDefaultNamespace(doc, simplify = TRUE)
)
{
# 3 cases:
# i) we are given a single string (e.g. "r") which we use as a prefix for the default namespace
# ii) we are given a vector of namespaces, but one has no name and we want to use that as the
# prefix for the default namespace
# e.g. sum(names(namespaces) == "") == 1)
# iii) given several elements with no name and we match these to those in the document
# if the first doesn't have a match, we use it as the default one.
# iv) mixture of prefix = uri values and strings with no names.
#
# if it is a single "prefix" and we have a default namespace, then map the prefix to the default URI
# and return.
if(is.character(namespaces) && length(namespaces) == 1 &&
is.null(names(namespaces)) && length(defaultNs) > 0) {
tmp = defaultNs
names(tmp)[names(tmp) == ""] = namespaces
# make certain to convert to c(id = url) form from an XMLNamespaceDefinition
tmp = as(tmp[[1]], "character")
# if no name, so default namespace, then use the one in namespaces.
if(length(names(tmp)) == 0 || names(tmp) == "")
names(tmp) = namespaces
return(tmp)
}
# fix the names so that we have empty ones if we have none at all.
if(is.null(names(namespaces)))
names(namespaces) = rep("", length(namespaces))
# which need to be fixed up.
i = (names(namespaces) == "")
if(any(i)) {
# from parameters now: nsDefs = xmlNamespaceDefinitions(xmlRoot(doc), recursive = TRUE)
# deal with the first one as a special case. If this has no match,
# we will map it to the default namespace's URI.
if(i[1] && length(defaultNs) && is.na(match(namespaces[1], names(nsDefs)))) {
names(namespaces)[1] = namespaces[1]
namespaces[1] = defaultNs
msg = paste("using", names(namespaces)[1], "as prefix for default namespace", defaultNs)
e = simpleWarning(msg)
class(e) = c("XPathDefaultNamespace", class(e))
warning(e)
i[1] = FALSE
}
if(sum(i) > 0) {
# So there is at least one namespace without a name.
# See if there are duplicates
dups = names(nsDefs)[duplicated(names(nsDefs))]
tmp = match(namespaces[i], dups)
if(length(dups) > 0 && any(is.na(tmp)))
stop("duplicate namespaces, so cannot match namespace prefix(es) ",
paste(namespaces[i][is.na(tmp)], collapse = ", "),
" in ", paste(unique(names(nsDefs)), collapse= ", "))
idx = match(namespaces[i], names(nsDefs))
if(any(is.na(idx)))
stop("cannot find defined namespace(s) with prefix(es) ", paste(namespaces[i][is.na(idx)], collapse = ", "))
names(namespaces)[i] = namespaces[i]
namespaces[i] = sapply(nsDefs[idx], function(x) x$uri)
# warning("namespaces without a name/prefix are not handled as you might expect in XPath. Use a prefix")
} else if(length(defaultNs) == 0)
stop("There is no default namespace on the target XML document")
}
if(!is.character(namespaces) || ( length(namespaces) > 1 && length(names(namespaces)) == 0))
stop("Namespaces must be a named character vector")
if(length(namespaces) && (length(names(namespaces)) == 0 || any(names(namespaces) == "")))
warning("namespaces without a name/prefix are not handled as you might expect in XPath. Use a prefix")
namespaces
}
getNodeSet =
function(doc, path, namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE), fun = NULL, sessionEncoding = CE_NATIVE,
addFinalizer = NA, ...)
{
xpathApply(doc, path, fun, ..., namespaces = namespaces, sessionEncoding = sessionEncoding, addFinalizer = addFinalizer)
}
xpathSApply =
function(doc, path, fun = NULL, ... , namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, simplify = TRUE, addFinalizer = NA)
{
answer = xpathApply(doc, path, fun, ..., namespaces = namespaces, resolveNamespaces = resolveNamespaces,
addFinalizer = addFinalizer)
# Taken from sapply
if (simplify && length(answer) && length(common.len <- unique(unlist(lapply(answer,
length)))) == 1) {
if (common.len == 1)
unlist(answer, recursive = FALSE)
else if (common.len > 1)
array(unlist(answer, recursive = FALSE), dim = c(common.len,
length(answer)), dimnames = if (!(is.null(n1 <- names(answer[[1]])) &
is.null(n2 <- names(answer))))
list(n1, n2))
else answer
}
else answer
}
xpathApply =
#
# the caller can give the same prefixes of the namespaces defined in
# the target document as simple names.
#
# xpathApply(d, "/o:a//c:c", fun = NULL, namespaces = c("o", "c"))
#
#
function(doc, path, fun = NULL, ... , namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, addFinalizer = NA, xpathFuns = list())
{
UseMethod("xpathApply")
}
xpathApply.XMLNode =
function(doc, path, fun = NULL, ... , namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, addFinalizer = NA, xpathFuns = list())
{
stop("xpathApply/xpathSApply/getNodeSet require an XML/HTML internal document or node. Use xmlParse() or htmlParse()")
}
toXMLNode =
#
# For taking an internal node and converting it to an R-level node
#
function(x, ...)
{
txt = saveXML(x)
xmlRoot(xmlTreeParse(txt, asText = TRUE))
}
xpathApply.XMLNode =
function(doc, path, fun = NULL, ... , namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, addFinalizer = NA, xpathFuns = list(), .node = NULL, noMatchOkay = FALSE)
{
idoc = xmlParse(saveXML(doc), asText = TRUE)
ans = xpathApply(idoc, path, fun, ..., namespaces = namespaces, resolveNamespaces = resolveNamespaces,
.node = .node, noMatchOkay = noMatchOkay, xpathFuns = xpathFuns)
# Now convert the results
if(length(ans))
ans = lapply(ans, toXMLNode)
ans
}
xpathApply.XMLInternalDocument =
function(doc, path, fun = NULL, ... , namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, addFinalizer = NA, xpathFuns = list(), .node = NULL, noMatchOkay = FALSE,
sessionEncoding = CE_NATIVE, noResultOk = FALSE) # native
{
path = paste(path, collapse = " | ")
if(is(namespaces, "list") && all(sapply(namespaces, is, "XMLNamespaceDefinition"))) {
namespaces = structure(sapply(namespaces, `[[`, "uri"), names = names(namespaces))
}
if(resolveNamespaces && !inherits( namespaces, "XMLNamespaceDefinitions"))
namespaces = matchNamespaces(doc, namespaces)
if(!is.null(fun) && !is.call(fun))
fun = match.fun(fun)
# create an expression of the form fun(x, ...) and the C code will insert x for each node.
args = list(...)
if(length(args))
fun = as.call(c(fun, append(1, args)))
#XXX Match the session encoding c("native" = 0, utf8 = 1, latin1 = 2)
encoding = if(is.integer(sessionEncoding))
sessionEncoding
else
getEncodingREnum(sessionEncoding)
anonFuns = NULL
if(is.character(xpathFuns))
xpathFuns = as.list(xpathFuns)
else
anonFuns = xpathFuns[ vapply(xpathFuns, is.function, FALSE) ]
ans = .Call("RS_XML_xpathEval", doc, .node, as.character(path), namespaces, fun, encoding, addFinalizer, xpathFuns, anonFuns, PACKAGE = "XML")
if(!noMatchOkay && length(ans) == 0 && length(getDefaultNamespace(xmlRoot(doc))) > 0) {
tmp = strsplit(path, "/")[[1]]
# if they have a function call, ignore.
tmp = tmp[ - grep("\\(", path) ]
if(length(grep(":", tmp)) != length(tmp) && !noResultOk)
warning("the XPath query has no namespace, but the target document has a default namespace. This is often an error and may explain why you obtained no results")
}
ans
}
xmlDoc =
function(node, addFinalizer = TRUE)
{
if(!is(node, "XMLInternalElementNode"))
stop("xmlDoc must be passed an internal XML node")
doc = .Call("RS_XML_createDocFromNode", node, PACKAGE = "XML")
addDocFinalizer(doc, addFinalizer)
doc
}
# Used to use
# getDefaultNamespace(doc)
if(FALSE) {
xpathApply.XMLInternalNode =
#
# There are several situations here.
# We/libxml2 needs a document to search.
# We have a node. If we use its document, all is fine, but the
# search will be over the entire document. We may get nodes from other sub-trees
# that do not pertain to our starting point (doc).
# Alternatively, we can create a new doc with this node as the top-level node
# and do the search. But then we end up with new nodes. So if you want to find
# nodes in the original document in order to change them rather than just read information
# from them, then you will be sorely mistaken when you think your changes have been applied
# to the original document.
#
# Regardless of what we do, we still have to figure out the adding of the doc attribute.
#
function(doc, path, fun = NULL, ... , namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE)
{
addDocAttribute = FALSE
# This is a wholesale copy.
addDocAttribute = TRUE
doc = xmlDoc(doc, TRUE)
#
# If the doc is already there, can't we just use that without copying it? Yes.
# XXX maybe not. Looks like libxml2 starts at the top of the doc again.
# But if there is no doc for this node, then we create a new doc and
# put a finalizer on it. But we attach the document as an attribute to each of the
# the resulting nodes. Then it will be protected from gc() and so will the nodes
# until each of the nodes are released.
#XXX??? What if we do a subsequent search on another of these nodes.
# Then need to add it to the results.
if(FALSE) {
tmp = as(doc, "XMLInternalDocument")
addDocAttribute = is.null(tmp)
if(is.null(tmp)) {
if(!is.null(attr(doc, "document")))
doc = attr(doc, "document")
else
doc = newXMLDoc(node = doc) # if we used xmlDoc(doc), no finalizer.
} else
doc = tmp
}
ans = xpathApply(doc, path, fun, ..., namespaces = namespaces, resolveNamespaces = resolveNamespaces)
if(addDocAttribute && length(ans))
ans = lapply(ans, function(x) { attr(x, "document") = doc; x})
ans
}
} # end if if(FALSE)
getRootNode =
function(node)
{
p = node
while(!is.null(xmlParent(p)))
p = xmlParent(p)
p
}
xpathApply.XMLInternalNode =
xpathSubNodeApply =
#
# This allows us to use XPath to search within a sub-tree of the document, i.e.
# from a particular node.
# This is slightly tricky because the libxml2 routines require a document to search.
# We could copy the nodes to a new document, e.g.
# xmlDoc(node)
# but then the results would be for new nodes, not the original ones.
# So we would not be able to find nodes and then modify them as we would be modifying the
# copies.
#
# If this what is desired, use
# doc = xmlDoc(node)
# xpathApply(doc, xpath, ...)
# and then that is what you will get.
#
# In the case that you want the original nodes in the result,
# then we have to do a little bit of work. We create a new document
# and set the source node as its root node. We arrange to
# a) put the node back where it came from and b) free the document.
# So there is no memory leak.
#
# The other thing we must do is to find the
#
#
# This version avoids doing any copying of nodes when there is a document already
# associated with the nodes.
#
function(doc, path, fun = NULL, ...,
namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, addFinalizer = NA, xpathFuns = list())
{
path = paste(path, collapse = " | ")
node = doc
addDocAttribute = FALSE
createdNewDocument = FALSE
tmp = as(doc, "XMLInternalDocument")
putBack =
function(node, info) {
if(!is.null(info$left))
addSibling(info$left, node)
else if(!is.null(info$right))
addSibling(info$right, node, after = FALSE)
else if(!is.null(info$parent))
addChildren(info$parent, node)
else if(!is.null(tmp))
addChildren(tmp, node)
}
info = list(parent = xmlParent(node),
left = getSibling(node, after = FALSE),
right = getSibling(node, after = TRUE))
# The approaches here are to create a new empty document and then set the node
# to be its root. We don't set the document for each of the sub-nodes but just this
# top-level node. Then we arrange that when we end this function, we discard the
# newly created document and put the node back into the original tree in its
# original position.
# This involves knowing the parent and the position at which to put the node back into the tree.
# If it is the top most node, i.e. no parent, then it is simple - just set the parent back to NULL.
# If it has a parent, but no siblings, just set the parent.
# And if it has a sibling, put it back next to that sibling.
# If it is the first child, put to the left of the sibling.
# If it is not, put to the right.
# Need to make certain the resulting nodes have the original document
# Use xmlSetTreeDoc rather than node->doc = node as this is recursive.
# And so this is now all done in the on.exit() via the call to RS_XML_setDocEl()
# doc = newXMLDoc(node = node, addFinalizer = getNativeSymbolInfo("R_xmlFreeDocLeaveChildren")$address)
doc = newXMLDoc(addFinalizer = FALSE)
parent = xmlParent(node)
.Call("RS_XML_setRootNode", doc, node, PACKAGE = "XML")
on.exit({ .Call("RS_XML_unsetDoc", node, unlink = TRUE, parent, TRUE, PACKAGE = "XML")
.Call("RS_XML_freeDoc", doc, PACKAGE = "XML")
if(!is.null(tmp)) {
# Need to create a new document with the current node as the root.
# When we are finished, we have to ensure that we put the node back into the original document
# We can use the same mechanism as when we have to create the document from scratch.
.Call("RS_XML_setDocEl", node, tmp, PACKAGE = "XML")
}
putBack(node, info)
})
docName(doc) = paste("created for xpathApply for", path, "in node", xmlName(node))
ans = xpathApply(doc, path, NULL, namespaces = namespaces, resolveNamespaces = resolveNamespaces, addFinalizer = addFinalizer, xpathFuns = xpathFuns)
if(length(ans) == 0)
return(ans)
# now check if the result was actually a descendant of our top-level node for this
# query. It is possible that it arose from a different sub-tree.
w = sapply(ans, function(el) .Call("RS_XML_isDescendantOf", el, node, strict = FALSE, PACKAGE = "XML"))
ans = ans[w]
# if(FALSE && addDocAttribute && length(ans))
# ans = lapply(ans, function(x) { attr(x, "document") = doc; x})
# if(createdNewDocument)
# # Need to remove the links from these nodes to the parent.
# lapply(ans, function(x) .Call("RS_XML_unsetDoc", x, unlink = FALSE, TRUE))
if(!is.null(fun))
lapply(ans, fun, ...)
else
ans
}
if(TRUE)
xpathApply.XMLInternalNode =
function(doc, path, fun = NULL, ...,
namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, addFinalizer = NA, xpathFuns = list())
{
ndoc = as(doc, "XMLInternalDocument")
if(is.null(ndoc))
xpathSubNodeApply(doc, path, fun, ..., namespaces = namespaces, resolveNamespaces = resolveNamespaces, addFinalizer = addFinalizer, xpathFuns = xpathFuns)
else
xpathApply.XMLInternalDocument(ndoc, path, fun, ...,
namespaces = namespaces, resolveNamespaces = resolveNamespaces,
.node = doc, addFinalizer = addFinalizer, xpathFuns = xpathFuns)
}
xpathApply.XMLDocument =
#xpathApply.XMLNode =
function(doc, path, fun = NULL, ... , namespaces = xmlNamespaceDefinitions(doc, simplify = TRUE),
resolveNamespaces = TRUE, .node = NULL, addFinalizer = NA, xpathFuns = list())
{
txt = saveXML(doc)
doc = xmlParse(txt, asText = TRUE)
ans = xpathApply(doc, path, fun, ..., namespaces = namespaces, resolveNamespaces = resolveNamespaces, .node = .node,
addFinalizer = addFinalizer, xpathFuns = xpathFuns)
if(length(ans))
ans = lapply(ans, toXMLNode)
ans
# stop("XPath expressions cannot be applied to R-level nodes. Use xmlParse() to process the document and then use xpathApply()")
}
# d = xmlTreeParse("data/book.xml", useInternal = TRUE)
# ch = getNodeSet(d, "//chapter")
# xpathApply(ch[[1]], "//section/title", xmlValue)
# d = xmlTreeParse("data/mtcars.xml", useIntern = TRUE); z = getNodeSet(d, "/dataset/variables")
# xpathApply(z[[1]], "variable[@unit]", NULL, namespaces = character())
getXMLPath =
function(node, defaultPrefix = "ns")
{
paste(unlist(c("", xmlAncestors(node, xmlName, defaultPrefix))), collapse = "/")
}
xmlAncestors =
function(x, fun = NULL, ..., addFinalizer = NA, count = -1L)
{
ans = list()
tmp = x
while(!is.null(tmp)) {
if(!is.null(fun))
ans = c(fun(tmp, ...), ans)
else
ans = c(tmp, ans)
if(count > 0 && length(ans) == count)
break
tmp = xmlParent(tmp, addFinalizer = addFinalizer)
}
ans
}
XML/R/xmlString.R 0000644 0001751 0000144 00000004150 13607633674 013262 0 ustar hornik users setClass("XMLString", contains = "character")
xml =
function(x)
{
new("XMLString", x)
}
isXMLString =
function(str)
{
is(str, "XMLString") || length(grep("<([a-zA-Z]+:)?[a-zA-Z]+(/?>| [a-zA-Z]+=[\"'])", str)) > 0
}
RXMLNamespaces =
c(r = "http://www.r-project.org",
rh = "http://www.r-project.org/help")
xmlParseString =
#
# have to do some trickery with garbage collection to avoid parsing
# the tree and handing back a node within that that will be freed
# when the top-level document is GC'ed
#
# If the caller gives us the target document, we parent the nodes
# into that and all is well.
# Otherwise, we have trouble and potential leak at present as we
# explicitly kill off the finalizer.
#
# This should be cured now in the XML package.
#
function(content, doc = NULL, namespaces = RXMLNamespaces, clean = TRUE, addFinalizer = NA)
{
f =
function(cdata = FALSE)
newXMLNode("para", newXMLTextNode(content, cdata = cdata, doc = doc), doc = doc)
# If the user has told us explicitly that this is not XML but raw text
# then put it into a para enclosed within CDATA, just in case.
if(inherits(content, "AsIs"))
return(f(TRUE))
if(!isXMLString(content))
return(f(TRUE))
content = as(content, "XMLString")
ns = paste(paste("xmlns", names(RXMLNamespaces), sep = ":"),
sprintf('"%s"', RXMLNamespaces), sep = "=", collapse = " ")
txt = paste('', content, "", sep = "")
local.doc = tryCatch(xmlParse(txt, addFinalizer = addFinalizer), # addFinalizer = !inherits(doc, "XMLInternalDocument")
error = function(e) e)
if(inherits(local.doc, "condition"))
return(f(TRUE))
tmp = xmlRoot(local.doc)
if(xmlSize(tmp) == 1) # inherits(tmp[[1]], "XMLInternalElementNode") && xmlName(tmp[[1]]) == "para")
tmp = tmp[[1]]
if(clean)
removeXMLNamespaces(tmp, .els = names(namespaces))
# XXX
if(inherits(doc, "XMLInternalDocument")) {
manageMemory = manageMemory_p(addFinalizer)
.Call("RS_XML_copyNodesToDoc", tmp, doc, addFinalizer, PACKAGE = "XML")
} else
tmp
}
XML/R/xincludes.R 0000644 0001751 0000144 00000006225 13607633674 013276 0 ustar hornik users
setGeneric("findXIncludeStartNodes",
function(doc, ...)
{
standardGeneric("findXIncludeStartNodes")
})
setMethod("findXIncludeStartNodes", "character",
function(doc, ...)
{
findXIncludeStartNodes(xmlParse(doc), ...)
})
setMethod("findXIncludeStartNodes", "XMLInternalDocument",
function(doc, ...)
{
findXIncludeStartNodes(xmlRoot(doc), ...)
})
setMethod("findXIncludeStartNodes", "XMLInternalElementNode",
function(doc, ...)
{
nodes = .Call("R_findXIncludeStartNodes", xmlRoot(doc), PACKAGE = "XML")
names(nodes) = sapply(nodes, xmlGetAttr, "href", NA)
nodes
})
findXInclude =
function(x, asNode = FALSE, recursive = FALSE)
{
while(!is.null(x)) {
tmp = getSiblingXIncludeStart(x, TRUE)
if(!is.null(tmp))
return(fixFindXInclude(tmp, asNode, recursive))
sib = x
if(is(sib, "XMLXIncludeStartNode"))
return(fixFindXInclude(sib, asNode, recursive)) # if(asNode) sib else xmlAttrs(sib))
x = xmlParent(x)
}
fixFindXInclude(x, asNode, recursive)
}
bad.findXInclude =
# This version just looks in the left sibling, not all siblings to the left.
function(x, asNode = FALSE, recursive = FALSE)
{
ans = NULL
while(!is.null(x)) {
prev = getSiblingXIncludeStart(x, FALSE)
if(inherits(prev, "XMLXIncludeStartNode")) {
ans = prev
break
}
x = xmlParent(x)
}
fixFindXInclude(ans, asNode, recursive)
}
fixFindXInclude =
function(ans, asNode = FALSE, recursive = FALSE)
{
if(is.null(ans))
return(NULL)
if(recursive) {
tmp = getXIncludePath(ans)
if(FALSE && grepl(sprintf("^(%s|http:|ftp:)", .Platform$file.sep), tmp))
tmp
else
sprintf("%s%s%s",
paste(dirname(unique(tmp)), collapse = .Platform$file.sep),
.Platform$file.sep,
xmlAttrs(ans))
} else
if(asNode) ans else xmlAttrs(ans)["href"]
}
getXIncludePath =
function(node)
{
x = xmlParent(node)
ans = character()
while(!is.null(x)) {
ans = c(ans, findXInclude(x))
prev = x
x = xmlParent(x)
}
c(docName(prev), ans)
}
getSiblingXIncludeStart =
function(x, asNode = FALSE)
{
sib = x
while(!is.null(sib)) {
if(inherits(sib, "XMLXIncludeEndNode"))
return(NULL)
if(inherits(sib, "XMLXIncludeStartNode"))
return(if(asNode) sib else xmlAttrs(sib))
sib <- getSibling(sib, FALSE)
}
NULL
}
getNodePosition =
function(x) {
if(is.list(x))
return(sapply(x, getNodePosition))
tmp = getNodeLocation(x)
sprintf("%s:%d", tmp$file[1], tmp$line)
}
getNodeLocation =
function(node, recursive = TRUE, fileOnly = FALSE)
{
if(is.list(node))
return(lapply(node, getNodeLocation, recursive, fileOnly))
fil = findXInclude(node, recursive = recursive)
if(is.null(fil))
fil = docName(node)
if(fileOnly)
fil[1]
else
list(file = fil, line = getLineNumber(node))
}
getLineNumber =
function(node, ...)
{
if(!is(node, "XMLInternalNode"))
stop("This must be an C-level/native/internal XML node, i.e. of class 'XMLInternalNode'. Got ", paste(class(node), collapse = ", "))
.Call("R_getLineNumber", node, PACKAGE = "XML")
}
XML/R/tangle1.R 0000644 0001751 0000144 00000001073 13607633674 012627 0 ustar hornik users tangleR = xxx_getRCode = # conflicts with getRCode in xmlInternalSource.R
function(doc, tags = c("code", "plot", "function"), out = gsub("\\.[a-zA-Z]+$", ".R", docName(doc)))
{
if(is.character(doc))
doc = xmlParse(doc)
xp = sprintf("//r:%s[not(@eval='false') and not(ancestor::section[@eval='false'])]",
tags)
code = xpathSApply(doc, paste(xp, collapse = " | "), xmlValue, namespaces = c("r" = "http://www.r-project.org"))
if(length(out) && !is.na(out)) {
cat(code, sep = "\n", file = out)
out
} else
code
}
XML/R/getRelativeURL.R 0000644 0001751 0000144 00000004031 13610555150 014111 0 ustar hornik users getRelativeURL =
#
# takes the name of a file/URL and a baseURL and
# figures out the URL for the new file given by u.
# This handles the case where the file/URL is relative to the
# the baseURL or if it is a fully qualified file or URL.
#
# getRelativeURL("/foo", "http://www.omegahat.net")
# getRelativeURL("/foo", "http://www.omegahat.net/")
# getRelativeURL("foo", "http://www.omegahat.net/")
# getRelativeURL("http://www.foo.org", "http://www.omegahat.net/")
#
# XXX test - baseURL with /path/ and u as /other/path. Looks okay. See
# ParsingStrategies example for kaggle.
# getRelativeURL("../foo/xyz/bar.html", "http://www.omegahat.net/a/b.html")
# getRelativeURL("./foo/xyz/bar.html", "http://www.omegahat.net/a/b.html")
# getRelativeURL("../foo/xyz/bar.html", "http://www.omegahat.net/a/b.html")
#
#
# BROKEN
# getRelativeURL("foo", ".") yields :///foo
#
#
# [Fixed] not working for ../...
# fails
# getRelativeURL("../foo", "http://www.omegahat.net/a/b.html")
# should be http://www.omegahat.net/foo
# or at least http://www.omegahat.net/a/../foo
function(u, baseURL, sep = "/", addBase = TRUE, simplify = TRUE, escapeQuery = FALSE)
{
if(length(u) > 1)
return(sapply(u, getRelativeURL, baseURL, sep))
pu = parseURI(u)
#XXX Need to strip the path in baseURL if pu$path starts with /
if(pu$scheme == "" && addBase) {
b = parseURI(baseURL)
b$query = ""
if(grepl("^/", pu$path)) {
b$path = u
return(as(b, "character"))
}
endsWithSlash = grepl("/$", b$path)
if(endsWithSlash && grepl("^\\./", u))
u = substring(u, 3)
b$path = sprintf("%s%s%s", if(endsWithSlash) b$path else dirname(b$path), if(endsWithSlash) "" else sep, u)
# handle .. in the path and try to collapse these.
if(simplify && grepl("..", b$path, fixed = TRUE))
b$path = simplifyPath(b$path)
return(as(b, "character"))
# b = as(b, "character")
# sprintf("%s%s%s", b, "" else sep, u)
} else
u
}
XML/R/simplifyPath.R 0000644 0001751 0000144 00000002560 13607633705 013742 0 ustar hornik users simplifyPath =
# Could use strsplit, etc.
# simplifyPath2("XMLBasics/../WebTechData/RawData/KyphosisRpartExtract.xml")
# simplifyPath2("../WebTechData/RawData/KyphosisRpartExtract.xml")
# simplifyPath2("../../WebTechData/RawData/KyphosisRpartExtract.xml")
# simplifyPath2("a/b/../../WebTechData/RawData/KyphosisRpartExtract.xml")
# simplifyPath2("top/a/b/../../WebTechData/RawData/KyphosisRpartExtract.xml")
# simplifyPath2("abc/../../WebTechData/RawData/KyphosisRpartExtract.xml")
function(path)
{
els = strsplit(path, "/")[[1]]
GoOn = TRUE
els = els[ els != "."]
while(GoOn && length(i <- which(els == ".."))) {
i = min(i)
if(length(i) == 1 && i == 1)
break
if(all(els[ seq( 1, i) ] == ".."))
break
if(i == 2 && els[1] == "..")
break
els = els[ - c(i, i - 1L) ]
}
paste(els, collapse = "/")
}
if(FALSE) {
simplifyPath =
function(path)
{
path = gsub("/\\./", "/", path)
path = gsub("^\\./", "", path)
# Doesn't handle "../foo"
while(grepl("[^./]/\\.\\.", path)) {
path = gsub("/[^/.]+/\\.\\./?", "/", path)
}
path = gsub("^(\\./)+", "", path)
path
}
simplifyPath1 =
# Could use strsplit, etc.
function(path)
{
els = strsplit(path, "/")[[1]]
while(length(i <- which(els == ".."))) {
i = max(i)
if(length(i) == 1 && i == 1)
break
i = i[i != 1]
}
paste(els, sep = "/")
}
}
XML/R/DTDClasses.R 0000644 0001751 0000144 00000004506 13607633665 013231 0 ustar hornik users #
# Some methods for the DTD classes, similar in spirit
# to those in XMLClasses
#
# print()
#
#
#
# XMLSystemEntity
# XMLEntity
# XMLElementDef
# XMLSequenceContent
# XMLOrContent
# XMLElementContent
# XMLAttributeDef
#
print.XMLElementDef <-
function(x, ...)
{
cat("\n")
if(length(x$attributes)) {
cat("\n")
}
}
print.XMLElementContent <-
function(x, ...)
{
if(names(x$type)[1] == "PCData") {
cat(" ( #PCDATA ) ")
return()
}
cat("(")
cat(x$elements)
cat(")",switch(names(x$ocur)[1],Once="", "One or More"="+","Zero or One"="?","Mult"="*"))
}
print.XMLOrContent <-
function(x, ...)
{
n <- length(x$elements)
cat("( ")
for(i in 1:n) {
print(x$elements[[i]])
if(i < n)
cat(" | ")
}
cat(" )")
}
print.XMLSequenceContent <-
function(x, ...)
{
cat("( ")
n <- length(x$elements)
for(i in 1:n) {
print(x$elements[[i]])
if(i < n)
cat(", ")
}
cat(" )")
}
print.XMLAttributeDef <-
function(x, ...)
{
if(names(x$defaultType)[1] != "Implied")
dflt <- paste("\"", x$defaultValue,"\"",collapse="",sep="")
else
dflt <- ""
cat(x$name, xmlAttributeType(x), xmlAttributeType(x, TRUE), dflt)
}
xmlAttributeType <-
function(def, defaultType = FALSE)
{
if(defaultType == FALSE & names(def$type)[1] == "Enumeration") {
return( paste("(",paste(def$defaultValue,collapse=" | "),")", sep=" ", collapse="") )
}
switch(ifelse(defaultType, names(def$defaultType)[1], names(def$type)[1]),
"Fixed" = "#FIXED",
"CDATA" = "CDATA",
"Implied" = "#IMPLIED",
"Required" = "#REQUIRED",
"Id" = "#ID",
"IDRef" = "#IDREF",
"IDRefs" = "#IDREFS",
"Entity" = "#ENTITY",
"Entities" = "ENTITIES",
"NMToken" = "#NMTOKEN",
"NMTokens" = "#NMTOKENS",
"Enumeration" = "",
"Notation" = "",
""
)
}
print.XMLEntity <-
function(x, ...)
{
cat("\n")
}
xmlAttrs.XMLElementDef <-
function(node, ...)
{
node$attributes
}
if(useS4) {
setGeneric("xmlAttrs", function(node, ...) standardGeneric("xmlAttrs"))
setMethod("xmlAttrs", "XMLElementDef", xmlAttrs.XMLElementDef)
}
XML/R/AAA.R 0000644 0001751 0000144 00000000016 13607633702 011642 0 ustar hornik users useS4 = FALSE
XML/R/htmlLinks.R 0000644 0001751 0000144 00000002276 13607633670 013243 0 ustar hornik users readHTMLLinks = getHTMLLinks =
function(doc, externalOnly = TRUE, xpQuery = "//a/@href", baseURL = docName(doc),
relative = FALSE)
{
if(is.character(doc))
doc = htmlParse(doc)
# put a . in front of the xpQuery if we have a node rather than a document.
if(is(doc, "XMLInternalNode") && grepl("^/", xpQuery))
xpQuery = sprintf(".%s", xpQuery)
links = as.character(getNodeSet(doc, xpQuery))
links = if(externalOnly)
grep("^#", links, value = TRUE, invert = TRUE)
else
links
#XXX Put base URL onto these links, relative!
if(relative)
sapply(links, getRelativeURL, baseURL)
else
links
}
getHTMLExternalFiles =
function(doc, xpQuery = c("//img/@src", "//link/@href", "//script/@href", "//embed/@src"),
baseURL = docName(doc), relative = FALSE, asNodes = FALSE, recursive = FALSE)
{
if(is.character(doc))
doc = htmlParse(doc)
if(asNodes)
xpQuery = gsub("/@[a-zA-Z-]$+", "", xpQuery)
nodes = getNodeSet(doc, xpQuery)
if(asNodes)
return(nodes)
nodes = as.character(nodes)
ans = if(relative)
getRelativeURL(nodes, baseURL)
else
nodes
# recursive.
ans
}
XML/R/xmlInternalSource.R 0000644 0001751 0000144 00000053765 13610555150 014753 0 ustar hornik users
#XXX Deal with line numbers in the original document.
#
# We could also do things this way
# source(textConnection(saveXML(xsltApplyStyleSheet("cityTemperatures.xml", "~/Projects/org/omegahat/XML/Literate/segment.xsl"))))
#
#
# Allow the user to specify a subset of nodes in which to find the code, etc. nodes.
# Or an XPath query to restrict the search.
# For example, suppose we have a document with two sections and we want to run the code
# in only one of those sections.
# getNodeSet(section[@id='second one'])
#
setOldClass("XMLNodeSet")
DefaultXPathNamespaces =
c(r = "http://www.r-project.org",
s = "http://cm.bell-labs.com/stat/S4",
omg = "http://www.omegahat.net",
mlb = "http://www.mathworks.com", # matlab
sh="http://www.shell.org",
perl = "http://www.perl.org",
py = "http://www.python.org",
fo="http://www.w3.org/1999/XSL/Format",
xsl="http://www.w3.org/1999/XSL/Transform",
xi="http://www.w3.org/2001/XInclude"
)
DefaultXMLSourceXPath =
sprintf("%s[not(@eval='false') and not(ancestor::ignore) and not(ancestor::section[@r:eval = 'false'])]",
c("//r:init", "//r:function", "//r:init", "//r:code", "//r:plot", "//r:expr"))
setGeneric("xmlSource",
function(url, ...,
envir = globalenv(),
xpath = character(),
ids = character(),
omit = character(),
ask = FALSE,
example = NA,
fatal = TRUE, verbose = TRUE, echo = verbose, print = echo,
xnodes = DefaultXMLSourceXPath,
namespaces = DefaultXPathNamespaces, section = character(), eval = TRUE, init = TRUE,
setNodeNames = FALSE, parse = TRUE, force = FALSE)
{
standardGeneric("xmlSource")
})
# Break up into methods for character and xmlInternalDocument
setMethod("xmlSource", c("character"),
function(url, ...,
envir =globalenv(),
xpath = character(),
ids = character(),
omit = character(),
ask = FALSE,
example = NA,
fatal = TRUE, verbose = TRUE, echo = verbose, print = echo,
xnodes = DefaultXMLSourceXPath,
namespaces = DefaultXPathNamespaces,
section = character(), eval = TRUE, init = TRUE, setNodeNames = FALSE, parse = TRUE, force = FALSE)
{
doc = xmlTreeParse(url, ..., useInternalNodes = TRUE)
xmlSource(doc, ..., envir = envir, xpath = xpath, ids = ids, omit = omit,
ask = ask, example = example, fatal = fatal, verbose = verbose,
print = print, xnodes = xnodes, namespaces = namespaces,
section = section, eval = eval, init = init, setNodeNames = setNodeNames, parse = parse, force = force)
})
setMethod("xmlSource", c("XMLInternalDocument"),
function(url, ...,
envir =globalenv(),
xpath = character(),
ids = character(),
omit = character(),
ask = FALSE,
example = NA,
fatal = TRUE, verbose = TRUE, echo = verbose, print = echo,
xnodes = DefaultXMLSourceXPath,
namespaces = DefaultXPathNamespaces,
section = character(), eval = TRUE, init = TRUE, setNodeNames = FALSE, parse = TRUE, force = FALSE)
{
doc = url
if(inherits(verbose, "numeric"))
verbose = verbose - 1
if(!is.character(section))
section = as.integer(section)
#XXX use section when processing the examples
if(length(example) && !all(is.na(example))) {
egs = getNodeSet(doc, "//r:example", namespaces)
if(length(egs)) {
ids = sapply(egs, xmlGetAttr, "id")
if(length(example) == 1 && is.na(example)) {
cat("Select an example\n")
example = ids[w <- menu(ids)]
}
if(inherits(example, "numeric")) {
i = example
} else {
i = pmatch(example, ids)
if(all(is.na(i)))
stop("no example named ", example)
}
# find any r:init nodes which are not inside an example.
init = getNodeSet(doc, "//r:init[not(ancestor::r:example)]",
c(r = "http://www.r-project.org"))
if(length(init)) {
xmlSource(init, envir = envir, omit = omit, verbose = verbose, namespaces = namespaces, eval = eval, force = force)
cat("Done doc-level init", length(init), "\n")
}
ans = sapply(i, function(x) {
nodes = getNodeSet(egs[[x]], paste(xnodes, collapse = "|"), namespaces)
if(verbose)
cat("Example", ids[x], "\n")
#XXX put the correct ids in her.
xmlSource(nodes, envir = envir, omit = omit, verbose = verbose, namespaces = namespaces, eval = eval, setNodeNames = setNodeNames, parse = parse, force = force)
})
return(ans)
}
}
# if(length(section) && is.character(section))
# section = paste("@id", ddQuote(section), sep = "=")
if(length(xpath)) {
# do an XPath query and then look inside the resulting nodes
# for the xnodes of interest.
if(length(section)) {
# XXX assumes just one section. What about c(1, 2, 4)
xpath =paste("//section[", section, "]", xpath, sep = "")
}
nodes = getNodeSet(doc, xpath, namespaces)
v =
unlist(lapply(nodes, function(n) {
unlist(lapply(xnodes,
function(p)
getNodeSet(n, p, namespaces)),
recursive = FALSE)
}), recursive = FALSE)
} else {
functions = limitXPathToSection(section, "//r:function[not(@eval = 'false') and not(ancestor::ignore)]")
xnodes = limitXPathToSection(section, xnodes)
# Do we need to ensure the order for the functions first?
v = getNodeSet(doc, paste(c(functions, xnodes), collapse = "|"), namespaces)
# v = getNodeSet(doc, functions, namespaces)
# w = getNodeSet(doc, xnodes, namespaces)
# v = c(v, w)
}
if(is.null(v))
stop("No matching nodes in the document found")
class(v) <- "XMLNodeSet"
# deal with a top-level node r:codeIds which is of the form
# abc
# def
# ghi
# i.e. a single entry on each line which identifies the nodes that are to be read.
if(missing(ids) && missing(xnodes) && length(ids <- getNodeSet(doc, "/*/r:codeIds|/*/invisible/r:codeIds",
namespaces = c(r = "http://www.r-project.org")))) {
if(length(ids) > 1) {
warning("more than one r:codeIds node. Using the first one")
}
# txt = paste(sapply(ids, xmlValue))
ids = strsplit(xmlValue(ids[[1]]), "\\\n")[[1]]
ids = unique(ids)
ids = ids[ids != ""]
}
xmlSource(v, ids = ids, omit = omit, ask = ask, fatal = fatal, verbose = verbose, envir = envir,
section = if(!is.character(section)) section else character(),
eval = eval, setNodeNames = setNodeNames, parse = parse, force = force)
})
limitXPathToSection =
#
# limitToSection(1:3)
# limitToSection(letters[1:3])
# limitToSection(letters[1:3], "//r:plot")
function(section, xpath = c("//r:code", "//r:func", "//r:plot", "//r:expr"))
{
if(length(section) == 0)
return(paste(xpath, collapse = "|"))
if(is.character(section))
section = paste("@id=", sQuote(section), sep = "")
paste(outer(section, xpath,
function(sect, xp)
paste("//section[", sect, "]", xp, sep = "")),
collapse = "|")
}
setMethod("xmlSource", "XMLNodeSet",
function(url, ..., envir =globalenv(),
xpath = character(),
ids = character(),
omit = character(),
ask = FALSE,
example = NA,
fatal = TRUE, verbose = TRUE, echo = verbose, print = echo,
xnodes = c("r:function[not(@val='false')]", "r:init[not(@eval='false')]", "r:code[not(@eval='false')]", "//r:plot[not(@eval='false')]"),
namespaces = DefaultXPathNamespaces, section = character(), eval = TRUE, init = TRUE, setNodeNames = FALSE, parse = TRUE, force = FALSE)
{
if(ask) {
doc = as(url[[1]], "XMLInternalDocument") #XXXX no doc here now.
v = getNodeSet(doc, "//r:function|//r:init|//r:code|//r:plot", namespaces)
funs = sapply(v, xmlName) == "function"
if(any(funs)) {
#XXX
}
}
ans = sapply(url, evalNode, envir = envir, verbose = verbose, ids = ids,
omit = omit, echo = echo, print = print, ask = ask, eval = eval, parse = parse, force = force)
if(setNodeNames)
names(ans) = sapply(url, getRCodeNodeName)
else
names(ans) = sapply(url, getNodePosition) #sapply(url, xmlName, full = TRUE)
invisible(ans)
})
evalNode =
function(node, envir = globalenv(), ids = character(), verbose = FALSE, echo = verbose, omit = character(),
namespaces = c(r = "http://www.r-project.org"), print = echo, ask = FALSE, eval = TRUE, parse = TRUE, force = FALSE)
{
#XXX check all ancestors. Ideally exclude them in the XPath query
if(!force && (xmlName(xmlParent(node)) == "ignore" ||
length(getNodeSet(node, "./ancestor::section[@r:eval='false']|./ancestor::para[@r:eval='false']",
c(r = "http://www.r-project.org"))) > 0))
return(FALSE)
tmp = xmlGetAttr(node, "id", NA)
if(is.na(tmp) && length(ids) > 0 && !("" %in% ids))
return()
if(!is.na(tmp)) {
if(length(omit) > 0 && tmp %in% omit) {
if(verbose)
warning("skipping id ", tmp)
return()
} else if(length(ids) > 0 && !(tmp %in% ids)) {
if(verbose)
warning("ignoring id ", tmp)
return()
}
}
tmp = xmlGetAttr(node, "ignore", NA, converter = as.logical)
if(!is.na(tmp) && tmp) {
if(verbose)
warning("ignoring node", as(node, "character"))
return()
}
# go through the node and see if there are any r:code nodes
# with a ref attribute
# and go fetch the corresponding node.
txt = paste(getRCode(node, namespaces), collapse = "\n")
if(!parse)
return(txt)
# txt = xmlValue(node)
if(verbose)
cat("*************\nEvaluating node\n")
cmd = parse(text = txt)
if(echo)
print(cmd)
if(eval) {
if(ask) {
w = utils::menu(c("evaluate", "skip", "terminate"))
if(w == 2)
return(NULL)
else if(w == 3)
stop("User terminated the xmlSource")
}
isPlot = xmlName(node) == "plot"
if(isPlot) {
f = xmlGetAttr(node, "img")
if(!is.null(f)) {
attrs = xmlAttrs(node)
dev = openDevice(f, attrs)
}
if(!xmlGetAttr(node, "continuePlot", FALSE, as.logical))
on.exit(dev.off())
}
ans = eval(cmd, envir)
if(isPlot && inherits(ans, "trellis"))
print(ans)
ans
} else
cmd
}
openDevice =
function(f, attrs)
{
if("format" %in% names(attrs))
ext = attrs["format"]
else
ext = getExtension(f)
fun = switch(ext, png = png, jpeg = jpeg, pdf = pdf)
args = lapply(c("width", "height"), getDevAttr, attrs, fun, as.numeric)
cat("opening device for", f, "\n")
fun(f)
}
getDevAttr =
function(name, attrs, devFun, converter = as.character)
{
if(name %in% names(attrs))
converter(attrs[[name]])
else if(name %in% names(formals(devFun)))
formals(devFun)[[name]]
else
converter(NA)
}
getExtension =
function(f)
{
gsub(".*\\.", "", basename(f))
}
getRCode =
function(node, namespaces = c(r = "http://www.r-project.org"), recursive = TRUE,
dropOutput = FALSE)
{
tmp = xmlSApply(node, function(x) {
if(inherits(x, c("XMLInternalCommentNode", "XMLInternalPINode"))) {
} else if(inherits(x, "XMLInternalElementNode") && xmlName(x, full = TRUE) %in% c("r:code", "r:frag")) {
ref = xmlGetAttr(x, "ref", NA)
if(!is.na(ref)) {
v = getNodeSet(as(x, "XMLInternalDocument"),
paste(sapply(c("code", "frag"),
function(x) paste("//r:", x , "[@id='", ref, "']", sep = "")), collapse = "|"),
namespaces)
if(length(v) == 0)
stop("No code block/fragment named ", ref)
else if(length(v) > 1)
stop("More than 1 code block/fragment named ", ref)
else
if(recursive)
getRCode(v[[1]], namespaces, recursive = TRUE, dropOutput = dropOutput)
else
xmlValue(v[[1]])
} else {
if(recursive)
getRCode(x, namespaces, recursive = TRUE, dropOutput = dropOutput)
else
xmlValue(x)
}
} else if(inherits(x, "XMLInternalElementNode") && xmlName(x, full = TRUE) %in% c("r:error", "r:output")) {
} else
xmlValue(x)
})
if(dropOutput && length(names(tmp)))
tmp = tmp[names(tmp) != "output"]
paste(tmp, collapse = "\n")
}
setClass("XMLCodeFile", contains = "character")
setClass("XMLCodeDoc", contains = "XMLInternalDocument")
setAs("XMLCodeFile", "XMLCodeDoc",
function(from) {
new("XMLCodeDoc", xmlParse(from))
})
setAs("character", "XMLCodeFile",
function(from) {
xmlCodeFile(from)
})
setAs("character", "XMLCodeDoc",
function(from) {
xmlCodeFile(from, TRUE)
})
xmlCodeFile =
function(f, parse = FALSE)
{
if(parse)
new("XMLCodeDoc", xmlParse(f))
else
new("XMLCodeFile", f)
}
utils::globalVariables("use_file")
tmp.source =
function (file, local = FALSE, echo = verbose, print.eval = echo,
verbose = getOption("verbose"), prompt.echo = getOption("prompt"),
max.deparse.length = 150, chdir = FALSE, encoding = getOption("encoding"),
continue.echo = getOption("continue"), skip.echo = 0,
keep.source = getOption("keep.source"))
{
if(length(verbose) == 0)
verbose = FALSE
if(chdir) {
cwd = getwd()
on.exit(setwd(cwd))
setwd(dirname(file))
}
xmlSource(file, verbose = verbose)
}
## This version would require us to document source()
## setGeneric("source", function(file, ...) standardGeneric("source"))
## tmp.source =
## function (file, verbose = getOption("verbose"), chdir = FALSE, ...)
## {
## if(length(verbose) == 0)
## verbose = FALSE
## if(chdir) {
## cwd = getwd()
## on.exit(setwd(cwd))
## setwd(dirname(file))
## }
## xmlSource(file, verbose = verbose)
## }
setMethod("source", "XMLCodeFile", tmp.source)
setMethod("[[", "XMLCodeFile",
function(x, i, j, ..., env = globalenv()) {
doc = as(x, "XMLCodeDoc")
n = getNodeSet(doc, paste("//*[@id=", sQuote(i), "]"))
if(length(n) == 0) {
# This needs code from ptoc to determine the name of an "element"
## was updateIds(doc, save = x), which was giving
## byte-compilation warnings in *other* packages.
doc = updateIds(doc)
}
eval(parse(text = xmlValue(n[[1]])), envir = env)
})
updateIds =
function(doc, ...)
{
nodes = getNodeSet(doc,
"//r:function[not(@id) and not(@eval = 'false')]|//r:code[not(@id) and not(@eval = 'false')]",
c("r" = "http://www.r-project.org"))
sapply(nodes, getCodeVar)
}
getCodeVar =
function(node)
{
e = parse(text = getRCode(node))
e = e[[length(e)]]
# This should use the code in ptoc in RTools.
id = if(class(e) %in% c("=", "<-"))
as.character(e[[2]])
else
NA
if(!is.na(id))
addAttributes(node, id = id)
id
}
#
# f = xmlCodeFile("~/Classes/stat242-08/Code/FormulaWeights/rpartScope.xml")
# source(f)
# f[["rpart.model"]]
setGeneric("xmlSourceFunctions",
function(doc, ids = character(), parse = TRUE, ...) {
standardGeneric("xmlSourceFunctions")
})
setMethod("xmlSourceFunctions", "character",
#
# evaluate the r:function nodes, or restricted to @id from ids.
#
function(doc, ids = character(), parse = TRUE, ...)
{
invisible(xmlSourceFunctions(xmlParse(doc), ids, parse = parse, ...))
})
sQuote =
function(x)
sprintf("'%s'", as.character(x))
setMethod("xmlSourceFunctions", "XMLInternalDocument",
#
# evaluate the r:function nodes, or restricted to @id from ids.
#
function(doc, ids = character(), parse = TRUE, setNodeNames = FALSE, ...)
{
if(length(ids))
nodes = getNodeSet(doc, paste("//r:function[", paste("@id", sQuote(ids), sep = "=", collapse = " or " ), "]"), c(r = "http://www.r-project.org"))
else
nodes = getNodeSet(doc, "//r:function[not(ancestor-or-self::*/@eval = 'false')]", c(r = "http://www.r-project.org"))
if(parse == FALSE)
return(nodes)
ans = xmlSource(nodes, ...)
if(setNodeNames)
names(ans) = sapply(nodes, getRCodeNodeName)
invisible(ans)
})
getRCodeNodeName =
function(node)
{
xmlGetAttr(node, "name", xmlGetAttr(node, "idx", getTaskId(node)))
}
################
setGeneric("xmlSourceSection",
function(doc, ids = character(),
xnodes = c(".//r:function", ".//r:init[not(@eval='false')]", ".//r:code[not(@eval='false')]", ".//r:plot[not(@eval='false')]"), namespaces = DefaultXPathNamespaces, ...)
standardGeneric("xmlSourceSection"))
setMethod("xmlSourceSection", "character",
function(doc, ids = character(), xnodes = c(".//r:function", ".//r:init[not(@eval='false')]", ".//r:code[not(@eval='false')]", ".//r:plot[not(@eval='false')]"), namespaces = DefaultXPathNamespaces, ...)
xmlSourceSection(xmlParse(doc), ids, xnodes, namespaces, ...))
setMethod("xmlSourceSection", "XMLInternalDocument",
function(doc, ids = character(),
xnodes = c(".//r:function", ".//r:init[not(@eval='false')]", ".//r:code[not(@eval='false')]", ".//r:plot[not(@eval='false')]"),
namespaces = DefaultXPathNamespaces, ...) {
nodes = getNodeSet(doc, "//section")
aids = sapply(nodes, xmlGetAttr, "id", NA)
m = pmatch(ids, aids)
if(any(is.na(m))) {
# for those ids the caller gave us that didn't match
# compare these to the titles.
i = which(is.na(m))
tmp = ids[i]
j = pmatch(ids[i], sapply(nodes, function(x) {
tmp = getNodeSet(x, "./title")
if(length(tmp))
xmlValue(tmp[[1]])
else
""
}))
m[i [!is.na(j)]] = j[!is.na(j)]
}
if(any(is.na(m)))
stop("cannot match section id or title for ", paste(m[is.na(m)], collapse = ", "))
lapply(nodes[m], evalSection, xnodes, namespaces, ...)
})
evalSection =
function(node, xnodes, namespaces = DefaultXPathNamespaces, envir = globalenv(), ...)
{
# Or use xnodes by stripping away any [] and .//
if(xmlName(node, TRUE) %in% c("r:function", "r:plot", "r:code", "r:graphics"))
return(evalNode(node, envir, ...))
xpath = paste(xnodes, collapse = "|")
nodes = getNodeSet(node, xpath, namespaces)
sapply(nodes, evalNode, envir, ...)
}
#############################################
# r:code[@thread='name']|r:code[ancestor::*[@thread='name']]
setGeneric("xmlSourceThread",
function(doc, id, envir = globalenv(), ...,
xnodes = c("r:function", "r:init", "r:code", "r:plot"))
standardGeneric("xmlSourceThread"))
setMethod("xmlSourceThread", "character",
function(doc, id, envir = globalenv(), ...,
xnodes = c("r:function", "r:init", "r:code", "r:plot"))
xmlSourceThread(xmlParse(doc), id, envir, ..., xnodes = xnodes)
)
setMethod("xmlSourceThread", "list",
function(doc, id, envir = globalenv(), ...,
xnodes = c("r:function", "r:init", "r:code", "r:plot"))
sapply(doc, evalNode, envir = envir, ..., xnodes = xnodes))
if(FALSE)
setMethod("xmlSourceThread", "XMLNodeList",
function(doc, id, envir = globalenv(), ...,
xnodes = c(".//r:function", ".//r:init[not(@eval='false')]", ".//r:code[not(@eval='false')]", ".//r:plot[not(@eval='false')]"))
sapply(doc, evalNode, envir = envir, ...))
setMethod("xmlSourceThread", "XMLInternalDocument",
function(doc, id, envir = globalenv(), ...,
xnodes = c("r:function", "r:init", "r:code", "r:plot")) {
# all the nodes that are "under" this thread.
xp = sprintf("//*[@thread='%s']", id)
anc = sprintf("//%s[not(ancestor::*[@thread]) and not(ancestor::altApproach)]", xnodes)
xp = paste(c(xp, anc), collapse = " | ")
nodes = getNodeSet(doc, xp)
sapply(nodes, evalSection, envir = envir, ..., xnodes = xnodes)
})
setGeneric("xmlSourceTask",
function(doc, id, envir = globalenv(), ...) {
standardGeneric("xmlSourceTask")
}
)
# nodes = c("r:code", "r:plot", "r:expr")
# fmt = paste(nodes, "[@thread='%s']", sep = "")
# xp = paste(sprintf(fmt, id), collapse = " | ")
# paste(nodes[]
# getNodeSet
# "r:code[@thread='%s']|r:plot[@thread='%s']|r:expr['
# paste(getNode
# })
# See tangle.R and xmlTangle.
# Fix this up. Just thrown down one morning.
xmlToCode = tangle =
function(doc, file = stdout())
{
e = xmlSourceFunctions(doc, eval = FALSE) # want to avoid parsing.
if(!is(file, "connection"))
con = file(file, "w")
sapply(e, function(x) cat(x, "\n", file = file))
file
}
getTaskId =
function(node) {
els = getNodeSet(node, ".//ancestor::task")
if(length(els))
xmlGetAttr(els[[1]], "id")
else
""
}
XML/R/xmlEventParse.R 0000644 0001751 0000144 00000012047 13610555150 014056 0 ustar hornik users
GeneralHandlerNames =
list(SAX = c("text", "startElement", "endElement", "comment",
"startDocument", "endDocument",
"processingInstruction", "entityDeclaration", "externalEntity"),
DOM = c("text", "startElement", "comment", "entity", "cdata",
"processingInstruction"))
checkHandlerNames =
function(handlers, id = "SAX")
{
if(is.null(handlers))
return(TRUE)
ids = names(handlers)
i = match(ids, GeneralHandlerNames)
prob = any(!is.na(i))
if(prob) {
warning("future versions of the XML package will require names of general handler functions to be prefixed by a . to distinguish them from handlers for nodes with those names. This _may_ affect the ", paste(names(handlers)[!is.na(i)], collapse = ", "))
}
if(any(w <- !sapply(handlers, is.function)))
warning("some handlers are not functions: ", paste(names(handlers[w]), collapse = ", "))
!prob
}
xmlEventParse <-
#
# Parses an XML file using an event parser which calls user-level functions in the
# `handlers' collection when different XML nodes are encountered in the parse stream.
#
# See also xmlParseTree()
#
function(file, handlers = xmlEventHandler(), ignoreBlanks = FALSE, addContext = TRUE,
useTagName = TRUE, asText = FALSE, trim=TRUE, useExpat = FALSE,
isURL=FALSE, state = NULL,
replaceEntities = TRUE, validate = FALSE, saxVersion = 1,
branches = NULL, useDotNames = length(grep("^\\.", names(handlers))) > 0,
error = xmlErrorCumulator(), addFinalizer = NA, encoding = character())
{
if(libxmlVersion()$major < 2 && !is.character(file))
stop("Without libxml2, the source of the XML can only be specified as a URI.")
i = grep("^/", names(handlers))
if(length(i)) {
endElementHandlers = handlers[i]
names(endElementHandlers) = gsub("^/", "", names(endElementHandlers))
handlers = handlers[ - i]
} else
endElementHandlers = list()
checkHandlerNames(handlers, "SAX")
if(validate)
warning("Currently, libxml2 does support validation using SAX/event-driven parsing. It requires a DOM.")
else {
oldValidate = xmlValidity()
xmlValidity(validate)
on.exit(xmlValidity(oldValidate))
}
if(!any(saxVersion == c(1, 2))) {
stop("saxVersion must be 1 or 2")
}
if(inherits(file, "connection")) {
con = file
if(!isOpen(file)) {
open(file, "r")
on.exit(close(con))
}
leftOver = ""
file = function(len) {
if(nchar(leftOver) > 0) {
txt = leftOver
} else {
# txt = readBin(con, "", n = len - 1L)
txt = readLines(con, 1)
}
if(length(txt) == 0)
return(txt)
if(len < nchar(txt, "bytes")) {
tmp = mkSubstringByBytes(txt, len)
leftOver <<- tmp[2] # substring(txt, len - 1)
txt =tmp[1] # substring(txt, 1, len - 2)
} else
leftOver <<- ""
paste(txt, "\n", sep = "")
}
} else if(is.function(file)) {
# call with -1 to allow us to close the connection
# if necessary.
on.exit(file(-1))
} else {
if(!asText && missing(isURL)) {
# check if this is a URL or regular file.
isURL <- length(grep("http://",file)) | length(grep("ftp://",file)) | length(grep("file://",file))
}
if(isURL == FALSE && asText == FALSE) {
file = path.expand(file)
if(file.exists(file) == FALSE)
stop(paste("File", file, "does not exist "))
}
file = as.character(file)
}
branches = as.list(branches)
if(length(branches) > 0 && (length(names(branches)) == 0 || any(names(branches) == "")))
stop("All branch elements must have a name!")
old = setEntitySubstitution(replaceEntities)
on.exit(setEntitySubstitution(old))
if(!is.function(error))
stop("error must be a function")
.oldErrorHandler = setXMLErrorHandler(error)
on.exit(.Call("RS_XML_setStructuredErrorHandler", .oldErrorHandler, PACKAGE = "XML"), add = TRUE)
state <- .Call("RS_XML_Parse", file, handlers, endElementHandlers,
as.logical(addContext), as.logical(ignoreBlanks),
as.logical(useTagName), as.logical(asText), as.logical(trim),
as.logical(useExpat), state, as.logical(replaceEntities),
as.logical(validate), as.integer(saxVersion), branches, as.logical(useDotNames), error,
addFinalizer, as.character(encoding),
PACKAGE = "XML")
if(!is.null(state))
return(state)
else
return(invisible(handlers))
}
mkSubstringByBytes =
function(txt, nbytes)
{
letters = strsplit(txt, "")[[1]]
nb = nchar(letters, "bytes")
i = which(cumsum(nb) >= nbytes)[1] - 1
c(paste(letters[1:i], collapse = ""), paste(letters[-(1:i)], collapse = ""))
}
xmlStopParser =
function(parser)
{
if(!inherits(parser, "XMLParserContext"))
stop("Need an XMLParserContext object for xmlStopParser")
.Call("RS_XML_xmlStopParser", parser, PACKAGE = "XML")
}
xmlParserContextFunction =
function(f, class = "XMLParserContextFunction")
{
class(f) = c(class, class(f))
f
}
XML/R/fixNS.R 0000644 0001751 0000144 00000002232 13607633702 012311 0 ustar hornik users MissingNS = c(gating = "http://www.crap.org",
'data-type' = "http://www.morecrap.org")
fixXMLNamespaces =
#
# call as
# dd = fixXMLNamespaces("~/v75_step6.wsp", .namespaces = MissingNS)
# or
# dd = fixXMLNamespaces("~/v75_step6.wsp", gating = "http://www.crap.org", 'data-type' = "http://www.morecrap.org")
#
function(doc = "~/v75_step6.wsp", ..., .namespaces = list(...))
{
# collect the error messages
e = xmlErrorCumulator(, FALSE)
doc = xmlParse(doc, error = e)
if(length(e) == 0)
return(doc)
# find the ones that refer to prefixes that are not defined
ns = grep("^Namespace prefix .* not defined", unique(environment(e)$messages), value = TRUE)
ns = unique(gsub("Namespace prefix ([^ ]+) .*", "\\1", ns))
# now set those name spaces on the root of the document
if(is(.namespaces, "list"))
.namespaces = structure(as.character(unlist(.namespaces)), names = names(.namespaces))
uris = .namespaces[ns]
if(length(uris)) {
mapply(function(id, uri)
newXMLNamespace(xmlRoot(doc), uri, id),
names(uris), uris)
xmlParse(saveXML(doc), asText = TRUE)
} else
doc
}
XML/R/namespaces.R 0000644 0001751 0000144 00000015200 13607633667 013412 0 ustar hornik users setGeneric("simplifyNamespaces",
function(doc, ...)
standardGeneric("simplifyNamespaces"))
setMethod("simplifyNamespaces", "character",
function(doc, ...) {
pdoc = xmlParseDoc(doc, NSCLEAN)
simplifyNamespaces(pdoc, ...)
})
xmlCleanNamespaces =
#
# @eg xmlParse("~/GitWorkingArea/XML/inst/exampleData/redundantNS.xml")
#
# ?Should we write the result to a file if we are given a file?
#
#
function(doc, options = integer(), out = docName(doc), ...)
{
if(is(doc, "XMLInternalDocument"))
doc = saveXML(doc)
options = unique(c(options, NSCLEAN))
newDoc = xmlParse(doc, ..., options = options)
if(is.logical(out))
out = if(out) docName(doc) else character()
if(is.character(out) && length(out))
saveXML(newDoc, out)
else
newDoc
}
setMethod("simplifyNamespaces", "XMLInternalDocument",
function(doc, alreadyCleaned = FALSE, ...) {
# find all the nodes, but discard the root node.
allNodes = getNodeSet(doc, "//node()") # [-1]
root = xmlRoot(doc)
# For each node, get its namespace definitions,
# and then zoom in on the nodes that have namespace definitions.
nsDefs = lapply(allNodes, xmlNamespaceDefinitions, simplify = TRUE)
w = sapply(nsDefs, length) > 0
tmp = structure(unlist(nsDefs[w]), names = sapply(nsDefs[w], names))
d = data.frame(uri = tmp, prefix = names(tmp), stringsAsFactors = FALSE)
multi = unlist(by(d, d$prefix, function(x) if(length(unique(x$uri)) == 1) character() else x$prefix[1]))
if(length(multi))
d = d[ ! (d$prefix %in% multi), ]
# Now we can move these namespace definitions to the top.
#
#
#
by(d, nsDefs,
function(x) {
u = unique(x$prefix)
})
# remove the
sapply(allNodes[w], removeXMLNamespaces)
nsDefs
})
getNodeNamespace =
# Figure out what namespace to use for this node and return a reference to that
# namespace definition object in C (a xmlNsPtr)
function(ns, nsDefs, node, namespace, noNamespace, namespaceDefinitions = NULL, parent = NULL,
suppressNamespaceWarning = FALSE)
{
if(noNamespace)
return(NULL)
if(is.character(namespace) && length(namespace) == 1L && !is.na(namespace) && namespace == "") {
if(length(namespaceDefinitions) == 0)
return(findNamespaceDefinition(node, ""))
}
if((is.list(namespace) || is.character(namespace)) && length(namespace) > 0) {
# a single element with no name so this is the prefix.
if(length(namespace) == 1 && length(names(namespace)) == 0) {
if(namespace %in% namespaceDefinitions) {
i = match(namespace, namespaceDefinitions)
ns = nsPrefix = names(namespaceDefinitions)[i]
} else if(namespace != "") {
ns = nsPrefix = namespace
}
} else {
# we have names and/or more than one element. So these are namespace definitions
if(length(names(namespace)) == 0)
names(namespace) <- rep("", length(namespace))
if(length(namespace) > 1 && !is.na(match(namespace[1], names(namespace)[-1]))) {
if(length(ns))
warning("ignoring first element of namespace and using prefix from node name, ", ns)
else {
ns = namespace[1]
namespace = namespace[-1]
}
}
if(length(namespace) > 1 && sum(names(namespace) == "") > 1)
warning("more than one namespace to use as the default")
nsDefs = lapply(seq(along = namespace),
function(i) {
prefix = names(namespace)[i]
newNamespace(node, namespace[[i]], prefix)
# Don't set the namespace. This is just a definition/declaration for
# this node, but not necessarily the namespace to use for this node.
# We set this below
})
names(nsDefs) = names(namespace)
}
}
# Now handle the prefix for this node.
if(length(ns)) {
i = match(ns, names(nsDefs))
if(is.na(i)) {
if(!is.null(parent))
ns = findNamespaceDefinition(node, ns)
else {
# raiseNsWarning(ns, suppressNamespaceWarning)
# attr(node, "xml:namespace") = ns
# ns = NULL
ns = newNamespace(node, character(), ns)
}
if(!inherits(ns, "XMLNamespaceRef"))
ns <- newNamespace(node, ns, "")
} else
ns <- nsDefs[[i]]
} else {
i = match("", names(nsDefs))
ns = if(is.na(i)) NULL else nsDefs[[i]]
# if now namespace and we have a parent, use its namespace
# if it has a namespace
if(!noNamespace && length(ns) == 0 && length(parent) > 0) {
ns = xmlNamespaceRef(parent)
if(!is.null(ns) && names(as(ns, "character")) != "")
ns = NULL
}
}
ns
}
raiseNsWarning =
function(ns, suppressNamespaceWarning)
{
if(is.character(suppressNamespaceWarning))
f = get(suppressNamespaceWarning, mode = "function")
else if(is.logical(suppressNamespaceWarning)) {
if(!suppressNamespaceWarning)
f = warning
else
return(NULL)
} else
f = function(...) {}
f("cannot find namespace definition for '", ns, "' because the node is not in a document and there are no matching local namespace definitions for this node")
}
fixDummyNS =
function(node, suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE))
{
return(NULL)
nodes = getNodeSet(node, "//*[./namespace::*[. = '']]", addFinalizer = FALSE)
lapply(nodes, completeDummyNS, suppressNamespaceWarning = suppressNamespaceWarning)
}
completeDummyNS =
function(node, suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE))
{
if(is.null(xmlParent(node)))
return(FALSE)
prefix = names(xmlNamespace(node))
ns = findNamespaceDefinition(xmlParent(node), prefix, error = FALSE)
if(is.null(ns))
raiseNsWarning(prefix, suppressNamespaceWarning)
# (if(suppressNamespaceWarning) warning else stop)("can't find namespace definition for prefix ", prefix)
else {
# remove the current namespace definition and kill it.
.Call("R_replaceDummyNS", node, ns, prefix, PACKAGE = "XML")
# setXMLNamespace(node, ns)
}
}
XML/R/zzz.R 0000644 0001751 0000144 00000002270 13607633674 012131 0 ustar hornik users if(FALSE) {
.First.lib <-
function(libname, pkgname)
{
library.dynam("XML", pkgname, libname)
if(.useNamespacesInXMLPackage && exists("setMethod")) {
.InitSAXMethods()
}
# Set the error handlers to our local ones.
.C("RSXML_setErrorHandlers", PACKAGE = "XML")
}
}
.onLoad =
function(libname, pkgname)
{
# Added by Uwe Ligges.
if(.Platform$OS.type == "windows"){
fixPath = base::normalizePath
temp <- Sys.getenv("PATH")
Sys.setenv("PATH" = paste(fixPath(file.path(libname, pkgname, "libs")),
file.path(Sys.getenv("R_HOME"), "modules", fsep="\\"), temp, sep=";"))
on.exit(Sys.setenv(PATH = temp))
}
library.dynam("XML", pkgname, libname)
if(exists("setMethod")) {
# .InitSAXMethods()
}
# Set the error handlers to our local ones.
.C("RSXML_setErrorHandlers", PACKAGE = "XML")
}
.onUnload <- function (libpath)
{
library.dynam.unload("XML", libpath)
}
if(FALSE) {
.Call =
function(name, ...)
{
base::.Call(name, ..., PACKAGE = "XML")
}
.C =
function(name, ...)
{
base::.C(name, ..., PACKAGE = "XML")
}
}
#
# Copyright (c) 1998, 1999 The Omega Project for Statistical Computing.
# All rights reserved.#
XML/R/xmlTreeParse.R 0000644 0001751 0000144 00000016341 13610046417 013676 0 ustar hornik users xmlSchemaParse =
function(file, asText = FALSE, xinclude = TRUE, error = xmlErrorCumulator())
{
xmlParse(file, asText = asText, isSchema = TRUE, xinclude = xinclude, error = error)
}
BOMRegExp = "(\\xEF\\xBB\\xBF|\\xFE\\xFF|\\xFF\\xFE)"
xmlTreeParse <-
#
# XML parser that reads the entire `document' tree into memory
# and then converts it to an R/S object.
# Uses the libxml from Daniel Veillard at W3.org.
#
# asText treat the value of file as XML text, not the name of a file containing
# the XML text, and parse that.
#
#
function(file, ignoreBlanks = TRUE, handlers = NULL,
replaceEntities = FALSE, asText = FALSE, trim = TRUE, validate = FALSE, getDTD = TRUE,
isURL = FALSE, asTree = FALSE, addAttributeNamespaces = FALSE,
useInternalNodes = FALSE, isSchema = FALSE,
fullNamespaceInfo = FALSE, encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0,
xinclude = TRUE, addFinalizer = TRUE, error = xmlErrorCumulator(), isHTML = FALSE, options = integer(),
parentFirst = FALSE)
{
isMissingAsText = missing(asText)
if(length(file) > 1) {
file = paste(file, collapse = "\n")
if(!missing(asText) && !asText)
stop(structure(list(message = "multiple URLs passed to xmlTreeParse. If this is the content of the file, specify asText = TRUE"),
class = c("MultipleURLError", "XMLParserError", "simpleError", "error", "condition")))
asText = TRUE
}
if(missing(isURL) && !asText)
isURL <- length(grep("^(http|ftp|file)://", file, useBytes = TRUE, perl = TRUE))
if(file == "" || length(file) == 0)
stop("empty or no content specified")
if(isHTML) {
validate = FALSE
getDTD = FALSE
isSchema = FALSE
docClass = "HTMLInternalDocument"
} else
docClass = character()
checkHandlerNames(handlers, "DOM")
if(missing(fullNamespaceInfo) && inherits(handlers, "RequiresNamespaceInfo"))
fullNamespaceInfo = TRUE
oldValidate = xmlValidity()
xmlValidity(validate)
on.exit(xmlValidity(oldValidate))
# check whether we are treating the file name as
# a) the XML text itself, or b) as a URL.
# Otherwise, check if the file exists and report an error.
if(!asText && isURL == FALSE) {
if(file.exists(file) == FALSE)
if(!missing(asText) && asText == FALSE) {
e = simpleError(paste("File", file, "does not exist"))
class(e) = c("FileNotFound", class(e))
stop(e)
}
else
asText <- TRUE
}
if(asText && length(file) > 1)
file = paste(file, collapse = "\n")
old = setEntitySubstitution(replaceEntities)
on.exit(setEntitySubstitution(old), add = TRUE)
# Look for a < in the string.
if(asText && length(grep(sprintf("^%s?\\s*<", BOMRegExp), file, perl = TRUE, useBytes = TRUE)) == 0) { # !isXMLString(file) ?
if(!isHTML || (isMissingAsText && !inherits(file, "AsIs"))) {
e = simpleError(paste("XML content does not seem to be XML:", if(file.exists(file)) file else sQuote(substring(file, 100))))
class(e) = c("XMLInputError", class(e))
(if(isHTML) warning else stop)(e)
}
}
if(!is.logical(xinclude)) {
# if(is(xinclude, "numeric"))
# xinclude = bitlist(xinclude) # see bitList.R
# else
xinclude = as.logical(xinclude)
}
if(!asText && !isURL)
file = path.expand(as.character(file))
if(useInternalNodes && trim) {
prevBlanks = .Call("RS_XML_setKeepBlanksDefault", 0L, PACKAGE = "XML")
on.exit(.Call("RS_XML_setKeepBlanksDefault", prevBlanks, PACKAGE = "XML"), add = TRUE)
}
.oldErrorHandler = setXMLErrorHandler(error)
on.exit(.Call("RS_XML_setStructuredErrorHandler", .oldErrorHandler, PACKAGE = "XML"), add = TRUE)
if(length(options))
options = sum(options) #XXX coerce to parser options
ans <- .Call("RS_XML_ParseTree", as.character(file), handlers,
as.logical(ignoreBlanks), as.logical(replaceEntities),
as.logical(asText), as.logical(trim), as.logical(validate), as.logical(getDTD),
as.logical(isURL), as.logical(addAttributeNamespaces),
as.logical(useInternalNodes), as.logical(isHTML), as.logical(isSchema),
as.logical(fullNamespaceInfo), as.character(encoding), as.logical(useDotNames),
xinclude, error, addFinalizer, as.integer(options), as.logical(parentFirst), PACKAGE = "XML")
if(!missing(handlers) && length(handlers) && !as.logical(asTree))
return(handlers)
if(!isSchema && length(class(ans)))
class(ans) = c(docClass, oldClass(class(ans)))
if(inherits(ans, "XMLInternalDocument"))
addDocFinalizer(ans, addFinalizer)
else if(!getDTD && !isSchema) {
#??? is this a good idea.
class(ans) = oldClass("XMLDocumentContent")
}
ans
}
xmlNativeTreeParse = xmlInternalTreeParse = xmlTreeParse
formals(xmlNativeTreeParse)[["useInternalNodes"]] = TRUE
formals(xmlInternalTreeParse)[["useInternalNodes"]] = TRUE
xmlParse = xmlNativeTreeParse
if(FALSE) {
# Another approach is to just change the call, as below, but this is tricky
# to get evaluation of arguments, etc. right.
tmp.xmlInternalTreeParse =
function(file, ignoreBlanks = TRUE, handlers=NULL,
replaceEntities=FALSE, asText=FALSE, trim=TRUE, validate=FALSE, getDTD=TRUE,
isURL=FALSE, asTree = FALSE, addAttributeNamespaces = FALSE,
isSchema = FALSE,
fullNamespaceInfo = FALSE, encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0, # will be switched to TRUE in the future.
xinclude = TRUE, addFinalizer = TRUE)
{
e = sys.call()
e[[1]] = as.name("xmlTreeParse")
e[[length(e) + 1]] = FALSE
names(e)[length(e)] = "useInternalNodes"
eval(e, parent.env())
}
# Could try adding this to the top of xmlTreeParse
# But it won't work with, e.g. lapply(fileNames, xmlInternalTreeParse)
# if(missing(useInternalNodes) && as.character(sys.call()[[1]]) == "xmlInternalTreeParse")
# useInternalNodes = FALSE
}
setGeneric("getEncoding",
function(obj, ...)
{
standardGeneric("getEncoding")
})
setMethod("getEncoding", "ANY", function(obj, ...) NA)
setMethod("getEncoding", "XMLInternalDocument",
function(obj, ...) {
.Call("R_getDocEncoding", obj, PACKAGE = "XML")
})
setMethod("getEncoding", "XMLInternalNode",
function(obj, ...) {
.Call("R_getDocEncoding", obj, PACKAGE = "XML")
})
if(FALSE) {
setMethod("getEncoding", "XMLInternalDOM",
function(obj, ...) {
getEncoding(obj)
})
}
xmlValidity =
function(val = integer(0))
{
.Call("RS_XML_getDefaultValiditySetting", as.integer(val), PACKAGE = "XML")
}
processXInclude =
function(node, flags = 0L)
UseMethod("processXInclude")
processXInclude.list =
function(node, flags = 0L)
{
lapply(node, processXInclude, flags)
}
processXInclude.XMLInternalDocument =
function(node, flags = 0L)
{
.Call("RS_XML_xmlXIncludeProcessFlags", node, as.integer(flags), PACKAGE = "XML")
}
processXInclude.XMLInternalElementNode =
function(node, flags = 0L)
{
# if(xmlName(node) != "include") # Should check name space also
# stop("can only process XInclude on include nodes")
.Call("RS_XML_xmlXIncludeProcessTreeFlags", node, as.integer(flags), PACKAGE = "XML")
}
XML/R/schema.R 0000644 0001751 0000144 00000007704 13610046417 012526 0 ustar hornik users setClass("ExternalReference", representation(ref = "externalptr"))
setClass("libxmlTypeTable",
representation(ref = "ExternalReference"))
# Identifies the class of the element within a libxmlTypeTable sub-class.
setGeneric("getTableElementType", function(table) standardGeneric("getTableElementType"))
# The name of the element is the name of the class of the table without the Table suffix and,
# with Ref tagged on and with xml as a suffix.
setMethod("getTableElementType", "libxmlTypeTable",
function(table)
paste("xml", gsub("Table$", "Ref", class(table)), sep = "")
)
setMethod("$<-", "libxmlTypeTable",
function(x, name, value) {
stop("These tables are read-only for the moment")
})
setMethod("names", "libxmlTypeTable",
function(x) {
.Call("R_libxmlTypeTable_names", x, character(0), PACKAGE = "XML")
})
setAs("libxmlTypeTable", "list",
function(from) {
.Call("R_libxmlTypeTable_names", from, getTableElementType(from), PACKAGE = "XML")
})
setMethod("$", "libxmlTypeTable",
function(x, name) {
.Call("R_libxmlTypeTable_lookup", x, name, getTableElementType(x), PACKAGE = "XML")
})
#################################################################
setClass("xmlSchemaRef", contains = "ExternalReference")
SchemaRefFields = c("name", "targetNamespace", "version", "id",
"typeDecl", "attrDecl", "attrgrpDecl", "elemDecl", "notaDecl", "schemasImports"
)
setMethod("$", "xmlSchemaRef",
function(x, name) {
idx = pmatch(name, SchemaRefFields)
if(is.na(idx))
stop("No field ", name, " in ", paste(SchemaRefFields, collapse = ", "))
sym <- paste("R_libxmlTypeTable", SchemaRefFields[idx], sep = "_")
.Call(sym, x, PACKAGE = "XML")
})
setMethod("names", "xmlSchemaRef", function(x) SchemaRefFields)
setClass("SchemaElementTable", contains = "libxmlTypeTable")
setClass("xmlSchemaElementRef", contains = "ExternalReference")
setClass("SchemaTypeTable", contains = "libxmlTypeTable")
setClass("xmlSchemaTypeRef", contains = "ExternalReference")
setClass("SchemaAttributeTable", contains = "libxmlTypeTable")
setClass("xmlSchemaAttributeRef", contains = "ExternalReference")
setClass("SchemaAttributeGroupTable", contains = "libxmlTypeTable")
setClass("xmlSchemaAttributeGroupRef", contains = "ExternalReference")
setClass("SchemaNotationTable", contains = "libxmlTypeTable")
setClass("xmlSchemaNotationRef", contains = "ExternalReference")
schemaValidationErrorHandler =
function()
{
errors = character()
warnings = character()
h = function(msg) {
if(inherits(msg, "XMLSchemaWarning"))
warnings <<- c(warnings, msg)
else
errors <<- c(errors, msg)
}
structure(list(handler = h, results = function() list(errors = errors, warnings = warnings)), class = "XMLSchemaValidateHandler")
}
xmlSchemaValidate =
# schemaValidationErrorHandler()
function(schema, doc, errorHandler = xmlErrorFun(), options = 0L)
{
if(is.character(doc))
doc = xmlParse(doc)
if(is.character(schema))
schema = xmlSchemaParse(schema)
.oldErrorHandler = setXMLErrorHandler(if(is.list(errorHandler)) errorHandler[[1]] else errorHandler)
on.exit(.Call("RS_XML_setStructuredErrorHandler", .oldErrorHandler, PACKAGE = "XML"), add = TRUE)
status = .Call("RS_XML_xmlSchemaValidateDoc", schema@ref, doc, as.integer(options), NULL, PACKAGE = "XML") # errorHandler)
if(inherits(errorHandler, "XMLStructuredErrorCumulator"))
structure(list(status = status, errors = errorHandler[[2]]()), class = "XMLSchemaValidationResults")
else if(inherits(errorHandler, "XMLSchemaValidateHandler"))
c(status = status, errorHandler$results())
else
status
}
setOldClass("XMLSchemaValidationResults")
setMethod("show", "XMLSchemaValidationResults",
function(object)
show(object$errors))
XML/R/XMLRErrorInfo.R 0000644 0001751 0000144 00000064314 13607633665 013713 0 ustar hornik users XMLParseErrors <-
structure(c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L,
38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L,
51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L,
64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L,
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L,
90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L,
102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 200L, 201L, 202L,
203L, 204L, 205L, 500L, 501L, 502L, 503L, 504L, 505L, 506L, 507L,
508L, 509L, 510L, 511L, 512L, 513L, 514L, 515L, 516L, 517L, 518L,
519L, 520L, 521L, 522L, 523L, 524L, 525L, 526L, 527L, 528L, 529L,
530L, 531L, 532L, 533L, 534L, 535L, 536L, 537L, 538L, 539L, 540L,
541L, 800L, 801L, 1000L, 1001L, 1002L, 1003L, 1004L, 1005L, 1006L,
1007L, 1008L, 1009L, 1010L, 1011L, 1012L, 1013L, 1014L, 1015L,
1016L, 1017L, 1018L, 1019L, 1020L, 1021L, 1022L, 1023L, 1024L,
1025L, 1026L, 1027L, 1028L, 1029L, 1030L, 1031L, 1032L, 1033L,
1034L, 1035L, 1036L, 1037L, 1038L, 1039L, 1040L, 1041L, 1042L,
1043L, 1044L, 1045L, 1046L, 1047L, 1048L, 1049L, 1050L, 1051L,
1052L, 1053L, 1054L, 1055L, 1056L, 1057L, 1058L, 1059L, 1060L,
1061L, 1062L, 1063L, 1064L, 1065L, 1066L, 1067L, 1068L, 1069L,
1070L, 1071L, 1072L, 1073L, 1074L, 1075L, 1076L, 1077L, 1078L,
1079L, 1080L, 1081L, 1082L, 1083L, 1084L, 1085L, 1086L, 1087L,
1088L, 1089L, 1090L, 1091L, 1092L, 1093L, 1094L, 1095L, 1096L,
1097L, 1098L, 1099L, 1100L, 1101L, 1102L, 1103L, 1104L, 1105L,
1106L, 1107L, 1108L, 1109L, 1110L, 1111L, 1112L, 1113L, 1114L,
1115L, 1116L, 1117L, 1118L, 1119L, 1120L, 1121L, 1122L, 1200L,
1201L, 1202L, 1203L, 1204L, 1205L, 1206L, 1207L, 1208L, 1209L,
1210L, 1211L, 1212L, 1213L, 1214L, 1215L, 1216L, 1217L, 1218L,
1219L, 1220L, 1221L, 1300L, 1301L, 1302L, 1303L, 1400L, 1401L,
1402L, 1403L, 1450L, 1500L, 1501L, 1502L, 1503L, 1504L, 1505L,
1506L, 1507L, 1508L, 1509L, 1510L, 1511L, 1512L, 1513L, 1514L,
1515L, 1516L, 1517L, 1518L, 1519L, 1520L, 1521L, 1522L, 1523L,
1524L, 1525L, 1526L, 1527L, 1528L, 1529L, 1530L, 1531L, 1532L,
1533L, 1534L, 1535L, 1536L, 1537L, 1538L, 1539L, 1540L, 1541L,
1542L, 1543L, 1544L, 1545L, 1546L, 1547L, 1548L, 1549L, 1550L,
1551L, 1552L, 1553L, 1554L, 1555L, 1556L, 1600L, 1601L, 1602L,
1603L, 1604L, 1605L, 1606L, 1607L, 1608L, 1609L, 1610L, 1611L,
1612L, 1613L, 1614L, 1615L, 1616L, 1617L, 1618L, 1650L, 1651L,
1652L, 1653L, 1654L, 1700L, 1701L, 1702L, 1703L, 1704L, 1705L,
1706L, 1707L, 1708L, 1709L, 1710L, 1711L, 1712L, 1713L, 1714L,
1715L, 1716L, 1717L, 1718L, 1719L, 1720L, 1721L, 1722L, 1723L,
1724L, 1725L, 1726L, 1727L, 1728L, 1729L, 1730L, 1731L, 1732L,
1733L, 1734L, 1735L, 1736L, 1737L, 1738L, 1739L, 1740L, 1741L,
1742L, 1743L, 1744L, 1745L, 1746L, 1747L, 1748L, 1749L, 1750L,
1751L, 1752L, 1753L, 1754L, 1755L, 1756L, 1757L, 1758L, 1759L,
1760L, 1761L, 1762L, 1763L, 1764L, 1765L, 1766L, 1767L, 1768L,
1769L, 1770L, 1771L, 1772L, 1773L, 1774L, 1775L, 1776L, 1777L,
1778L, 1779L, 1780L, 1781L, 1782L, 1783L, 1784L, 1785L, 1786L,
1787L, 1788L, 1789L, 1790L, 1791L, 1792L, 1793L, 1794L, 1795L,
1796L, 1797L, 1798L, 1799L, 1800L, 1801L, 1802L, 1803L, 1804L,
1805L, 1806L, 1807L, 1808L, 1809L, 1810L, 1811L, 1812L, 1813L,
1814L, 1815L, 1816L, 1817L, 1818L, 1819L, 1820L, 1821L, 1822L,
1823L, 1824L, 1825L, 1826L, 1827L, 1828L, 1829L, 1830L, 1831L,
1832L, 1833L, 1834L, 1835L, 1836L, 1837L, 1838L, 1839L, 1840L,
1841L, 1842L, 1843L, 1844L, 1845L, 1846L, 1847L, 1848L, 1849L,
1850L, 1851L, 1852L, 1853L, 1854L, 1855L, 1856L, 1857L, 1858L,
1859L, 1860L, 1861L, 1862L, 1863L, 1864L, 1865L, 1866L, 1867L,
1868L, 1869L, 1870L, 1871L, 1872L, 1873L, 1874L, 1875L, 1876L,
1877L, 1878L, 1879L, 1900L, 1901L, 1902L, 1903L, 1950L, 1951L,
1952L, 1953L, 1954L, 1955L, 2000L, 2001L, 2002L, 2003L, 2020L,
2021L, 2022L, 3000L, 3001L, 3002L, 3003L, 3004L, 3005L, 3006L,
3007L, 3008L, 3009L, 3010L, 3011L, 3012L, 3013L, 3014L, 3015L,
3016L, 3017L, 3018L, 3019L, 3020L, 3021L, 3022L, 3023L, 3024L,
3025L, 3026L, 3027L, 3028L, 3029L, 3030L, 3031L, 3032L, 3033L,
3034L, 3035L, 3036L, 3037L, 3038L, 3039L, 3040L, 3041L, 3042L,
3043L, 3044L, 3045L, 3046L, 3047L, 3048L, 3049L, 3050L, 3051L,
3052L, 3053L, 3054L, 3055L, 3056L, 3057L, 3058L, 3059L, 3060L,
3061L, 3062L, 3063L, 3064L, 3065L, 3066L, 3067L, 3068L, 3069L,
3070L, 3071L, 3072L, 3073L, 3074L, 3075L, 3076L, 3077L, 3078L,
3079L, 3080L, 3081L, 3082L, 3083L, 3084L, 3085L, 3086L, 3087L,
3088L, 3089L, 3090L, 3091L, 4000L, 4001L, 4900L, 4901L, 5000L,
5001L, 5002L, 5003L, 5004L, 5005L, 5006L, 5007L, 5008L, 5009L,
5010L, 5011L, 5012L, 5013L, 5014L, 5015L, 5016L, 5017L, 5018L,
5019L, 5020L, 5021L, 5022L, 5023L, 5024L, 5025L, 5026L, 5027L,
5028L, 5029L, 5030L, 5031L, 5032L, 5033L, 5034L, 5035L, 5036L,
5037L, 6000L, 6001L, 6002L, 6003L, 6004L), .Names = c("XML_ERR_OK",
"XML_ERR_INTERNAL_ERROR", "XML_ERR_NO_MEMORY", "XML_ERR_DOCUMENT_START",
"XML_ERR_DOCUMENT_EMPTY", "XML_ERR_DOCUMENT_END", "XML_ERR_INVALID_HEX_CHARREF",
"XML_ERR_INVALID_DEC_CHARREF", "XML_ERR_INVALID_CHARREF", "XML_ERR_INVALID_CHAR",
"XML_ERR_CHARREF_AT_EOF", "XML_ERR_CHARREF_IN_PROLOG", "XML_ERR_CHARREF_IN_EPILOG",
"XML_ERR_CHARREF_IN_DTD", "XML_ERR_ENTITYREF_AT_EOF", "XML_ERR_ENTITYREF_IN_PROLOG",
"XML_ERR_ENTITYREF_IN_EPILOG", "XML_ERR_ENTITYREF_IN_DTD", "XML_ERR_PEREF_AT_EOF",
"XML_ERR_PEREF_IN_PROLOG", "XML_ERR_PEREF_IN_EPILOG", "XML_ERR_PEREF_IN_INT_SUBSET",
"XML_ERR_ENTITYREF_NO_NAME", "XML_ERR_ENTITYREF_SEMICOL_MISSING",
"XML_ERR_PEREF_NO_NAME", "XML_ERR_PEREF_SEMICOL_MISSING", "XML_ERR_UNDECLARED_ENTITY",
"XML_WAR_UNDECLARED_ENTITY", "XML_ERR_UNPARSED_ENTITY", "XML_ERR_ENTITY_IS_EXTERNAL",
"XML_ERR_ENTITY_IS_PARAMETER", "XML_ERR_UNKNOWN_ENCODING", "XML_ERR_UNSUPPORTED_ENCODING",
"XML_ERR_STRING_NOT_STARTED", "XML_ERR_STRING_NOT_CLOSED", "XML_ERR_NS_DECL_ERROR",
"XML_ERR_ENTITY_NOT_STARTED", "XML_ERR_ENTITY_NOT_FINISHED",
"XML_ERR_LT_IN_ATTRIBUTE", "XML_ERR_ATTRIBUTE_NOT_STARTED", "XML_ERR_ATTRIBUTE_NOT_FINISHED",
"XML_ERR_ATTRIBUTE_WITHOUT_VALUE", "XML_ERR_ATTRIBUTE_REDEFINED",
"XML_ERR_LITERAL_NOT_STARTED", "XML_ERR_LITERAL_NOT_FINISHED",
"XML_ERR_COMMENT_NOT_FINISHED", "XML_ERR_PI_NOT_STARTED", "XML_ERR_PI_NOT_FINISHED",
"XML_ERR_NOTATION_NOT_STARTED", "XML_ERR_NOTATION_NOT_FINISHED",
"XML_ERR_ATTLIST_NOT_STARTED", "XML_ERR_ATTLIST_NOT_FINISHED",
"XML_ERR_MIXED_NOT_STARTED", "XML_ERR_MIXED_NOT_FINISHED", "XML_ERR_ELEMCONTENT_NOT_STARTED",
"XML_ERR_ELEMCONTENT_NOT_FINISHED", "XML_ERR_XMLDECL_NOT_STARTED",
"XML_ERR_XMLDECL_NOT_FINISHED", "XML_ERR_CONDSEC_NOT_STARTED",
"XML_ERR_CONDSEC_NOT_FINISHED", "XML_ERR_EXT_SUBSET_NOT_FINISHED",
"XML_ERR_DOCTYPE_NOT_FINISHED", "XML_ERR_MISPLACED_CDATA_END",
"XML_ERR_CDATA_NOT_FINISHED", "XML_ERR_RESERVED_XML_NAME", "XML_ERR_SPACE_REQUIRED",
"XML_ERR_SEPARATOR_REQUIRED", "XML_ERR_NMTOKEN_REQUIRED", "XML_ERR_NAME_REQUIRED",
"XML_ERR_PCDATA_REQUIRED", "XML_ERR_URI_REQUIRED", "XML_ERR_PUBID_REQUIRED",
"XML_ERR_LT_REQUIRED", "XML_ERR_GT_REQUIRED", "XML_ERR_LTSLASH_REQUIRED",
"XML_ERR_EQUAL_REQUIRED", "XML_ERR_TAG_NAME_MISMATCH", "XML_ERR_TAG_NOT_FINISHED",
"XML_ERR_STANDALONE_VALUE", "XML_ERR_ENCODING_NAME", "XML_ERR_HYPHEN_IN_COMMENT",
"XML_ERR_INVALID_ENCODING", "XML_ERR_EXT_ENTITY_STANDALONE",
"XML_ERR_CONDSEC_INVALID", "XML_ERR_VALUE_REQUIRED", "XML_ERR_NOT_WELL_BALANCED",
"XML_ERR_EXTRA_CONTENT", "XML_ERR_ENTITY_CHAR_ERROR", "XML_ERR_ENTITY_PE_INTERNAL",
"XML_ERR_ENTITY_LOOP", "XML_ERR_ENTITY_BOUNDARY", "XML_ERR_INVALID_URI",
"XML_ERR_URI_FRAGMENT", "XML_WAR_CATALOG_PI", "XML_ERR_NO_DTD",
"XML_ERR_CONDSEC_INVALID_KEYWORD", "XML_ERR_VERSION_MISSING",
"XML_WAR_UNKNOWN_VERSION", "XML_WAR_LANG_VALUE", "XML_WAR_NS_URI",
"XML_WAR_NS_URI_RELATIVE", "XML_ERR_MISSING_ENCODING", "XML_WAR_SPACE_VALUE",
"XML_ERR_NOT_STANDALONE", "XML_ERR_ENTITY_PROCESSING", "XML_ERR_NOTATION_PROCESSING",
"XML_WAR_NS_COLUMN", "XML_WAR_ENTITY_REDEFINED", "XML_ERR_UNKNOWN_VERSION",
"XML_ERR_VERSION_MISMATCH", "XML_NS_ERR_XML_NAMESPACE", "XML_NS_ERR_UNDEFINED_NAMESPACE",
"XML_NS_ERR_QNAME", "XML_NS_ERR_ATTRIBUTE_REDEFINED", "XML_NS_ERR_EMPTY",
"XML_NS_ERR_COLON", "XML_DTD_ATTRIBUTE_DEFAULT", "XML_DTD_ATTRIBUTE_REDEFINED",
"XML_DTD_ATTRIBUTE_VALUE", "XML_DTD_CONTENT_ERROR", "XML_DTD_CONTENT_MODEL",
"XML_DTD_CONTENT_NOT_DETERMINIST", "XML_DTD_DIFFERENT_PREFIX",
"XML_DTD_ELEM_DEFAULT_NAMESPACE", "XML_DTD_ELEM_NAMESPACE", "XML_DTD_ELEM_REDEFINED",
"XML_DTD_EMPTY_NOTATION", "XML_DTD_ENTITY_TYPE", "XML_DTD_ID_FIXED",
"XML_DTD_ID_REDEFINED", "XML_DTD_ID_SUBSET", "XML_DTD_INVALID_CHILD",
"XML_DTD_INVALID_DEFAULT", "XML_DTD_LOAD_ERROR", "XML_DTD_MISSING_ATTRIBUTE",
"XML_DTD_MIXED_CORRUPT", "XML_DTD_MULTIPLE_ID", "XML_DTD_NO_DOC",
"XML_DTD_NO_DTD", "XML_DTD_NO_ELEM_NAME", "XML_DTD_NO_PREFIX",
"XML_DTD_NO_ROOT", "XML_DTD_NOTATION_REDEFINED", "XML_DTD_NOTATION_VALUE",
"XML_DTD_NOT_EMPTY", "XML_DTD_NOT_PCDATA", "XML_DTD_NOT_STANDALONE",
"XML_DTD_ROOT_NAME", "XML_DTD_STANDALONE_WHITE_SPACE", "XML_DTD_UNKNOWN_ATTRIBUTE",
"XML_DTD_UNKNOWN_ELEM", "XML_DTD_UNKNOWN_ENTITY", "XML_DTD_UNKNOWN_ID",
"XML_DTD_UNKNOWN_NOTATION", "XML_DTD_STANDALONE_DEFAULTED", "XML_DTD_XMLID_VALUE",
"XML_DTD_XMLID_TYPE", "XML_DTD_DUP_TOKEN", "XML_HTML_STRUCURE_ERROR",
"XML_HTML_UNKNOWN_TAG", "XML_RNGP_ANYNAME_ATTR_ANCESTOR", "XML_RNGP_ATTR_CONFLICT",
"XML_RNGP_ATTRIBUTE_CHILDREN", "XML_RNGP_ATTRIBUTE_CONTENT",
"XML_RNGP_ATTRIBUTE_EMPTY", "XML_RNGP_ATTRIBUTE_NOOP", "XML_RNGP_CHOICE_CONTENT",
"XML_RNGP_CHOICE_EMPTY", "XML_RNGP_CREATE_FAILURE", "XML_RNGP_DATA_CONTENT",
"XML_RNGP_DEF_CHOICE_AND_INTERLEAVE", "XML_RNGP_DEFINE_CREATE_FAILED",
"XML_RNGP_DEFINE_EMPTY", "XML_RNGP_DEFINE_MISSING", "XML_RNGP_DEFINE_NAME_MISSING",
"XML_RNGP_ELEM_CONTENT_EMPTY", "XML_RNGP_ELEM_CONTENT_ERROR",
"XML_RNGP_ELEMENT_EMPTY", "XML_RNGP_ELEMENT_CONTENT", "XML_RNGP_ELEMENT_NAME",
"XML_RNGP_ELEMENT_NO_CONTENT", "XML_RNGP_ELEM_TEXT_CONFLICT",
"XML_RNGP_EMPTY", "XML_RNGP_EMPTY_CONSTRUCT", "XML_RNGP_EMPTY_CONTENT",
"XML_RNGP_EMPTY_NOT_EMPTY", "XML_RNGP_ERROR_TYPE_LIB", "XML_RNGP_EXCEPT_EMPTY",
"XML_RNGP_EXCEPT_MISSING", "XML_RNGP_EXCEPT_MULTIPLE", "XML_RNGP_EXCEPT_NO_CONTENT",
"XML_RNGP_EXTERNALREF_EMTPY", "XML_RNGP_EXTERNAL_REF_FAILURE",
"XML_RNGP_EXTERNALREF_RECURSE", "XML_RNGP_FORBIDDEN_ATTRIBUTE",
"XML_RNGP_FOREIGN_ELEMENT", "XML_RNGP_GRAMMAR_CONTENT", "XML_RNGP_GRAMMAR_EMPTY",
"XML_RNGP_GRAMMAR_MISSING", "XML_RNGP_GRAMMAR_NO_START", "XML_RNGP_GROUP_ATTR_CONFLICT",
"XML_RNGP_HREF_ERROR", "XML_RNGP_INCLUDE_EMPTY", "XML_RNGP_INCLUDE_FAILURE",
"XML_RNGP_INCLUDE_RECURSE", "XML_RNGP_INTERLEAVE_ADD", "XML_RNGP_INTERLEAVE_CREATE_FAILED",
"XML_RNGP_INTERLEAVE_EMPTY", "XML_RNGP_INTERLEAVE_NO_CONTENT",
"XML_RNGP_INVALID_DEFINE_NAME", "XML_RNGP_INVALID_URI", "XML_RNGP_INVALID_VALUE",
"XML_RNGP_MISSING_HREF", "XML_RNGP_NAME_MISSING", "XML_RNGP_NEED_COMBINE",
"XML_RNGP_NOTALLOWED_NOT_EMPTY", "XML_RNGP_NSNAME_ATTR_ANCESTOR",
"XML_RNGP_NSNAME_NO_NS", "XML_RNGP_PARAM_FORBIDDEN", "XML_RNGP_PARAM_NAME_MISSING",
"XML_RNGP_PARENTREF_CREATE_FAILED", "XML_RNGP_PARENTREF_NAME_INVALID",
"XML_RNGP_PARENTREF_NO_NAME", "XML_RNGP_PARENTREF_NO_PARENT",
"XML_RNGP_PARENTREF_NOT_EMPTY", "XML_RNGP_PARSE_ERROR", "XML_RNGP_PAT_ANYNAME_EXCEPT_ANYNAME",
"XML_RNGP_PAT_ATTR_ATTR", "XML_RNGP_PAT_ATTR_ELEM", "XML_RNGP_PAT_DATA_EXCEPT_ATTR",
"XML_RNGP_PAT_DATA_EXCEPT_ELEM", "XML_RNGP_PAT_DATA_EXCEPT_EMPTY",
"XML_RNGP_PAT_DATA_EXCEPT_GROUP", "XML_RNGP_PAT_DATA_EXCEPT_INTERLEAVE",
"XML_RNGP_PAT_DATA_EXCEPT_LIST", "XML_RNGP_PAT_DATA_EXCEPT_ONEMORE",
"XML_RNGP_PAT_DATA_EXCEPT_REF", "XML_RNGP_PAT_DATA_EXCEPT_TEXT",
"XML_RNGP_PAT_LIST_ATTR", "XML_RNGP_PAT_LIST_ELEM", "XML_RNGP_PAT_LIST_INTERLEAVE",
"XML_RNGP_PAT_LIST_LIST", "XML_RNGP_PAT_LIST_REF", "XML_RNGP_PAT_LIST_TEXT",
"XML_RNGP_PAT_NSNAME_EXCEPT_ANYNAME", "XML_RNGP_PAT_NSNAME_EXCEPT_NSNAME",
"XML_RNGP_PAT_ONEMORE_GROUP_ATTR", "XML_RNGP_PAT_ONEMORE_INTERLEAVE_ATTR",
"XML_RNGP_PAT_START_ATTR", "XML_RNGP_PAT_START_DATA", "XML_RNGP_PAT_START_EMPTY",
"XML_RNGP_PAT_START_GROUP", "XML_RNGP_PAT_START_INTERLEAVE",
"XML_RNGP_PAT_START_LIST", "XML_RNGP_PAT_START_ONEMORE", "XML_RNGP_PAT_START_TEXT",
"XML_RNGP_PAT_START_VALUE", "XML_RNGP_PREFIX_UNDEFINED", "XML_RNGP_REF_CREATE_FAILED",
"XML_RNGP_REF_CYCLE", "XML_RNGP_REF_NAME_INVALID", "XML_RNGP_REF_NO_DEF",
"XML_RNGP_REF_NO_NAME", "XML_RNGP_REF_NOT_EMPTY", "XML_RNGP_START_CHOICE_AND_INTERLEAVE",
"XML_RNGP_START_CONTENT", "XML_RNGP_START_EMPTY", "XML_RNGP_START_MISSING",
"XML_RNGP_TEXT_EXPECTED", "XML_RNGP_TEXT_HAS_CHILD", "XML_RNGP_TYPE_MISSING",
"XML_RNGP_TYPE_NOT_FOUND", "XML_RNGP_TYPE_VALUE", "XML_RNGP_UNKNOWN_ATTRIBUTE",
"XML_RNGP_UNKNOWN_COMBINE", "XML_RNGP_UNKNOWN_CONSTRUCT", "XML_RNGP_UNKNOWN_TYPE_LIB",
"XML_RNGP_URI_FRAGMENT", "XML_RNGP_URI_NOT_ABSOLUTE", "XML_RNGP_VALUE_EMPTY",
"XML_RNGP_VALUE_NO_CONTENT", "XML_RNGP_XMLNS_NAME", "XML_RNGP_XML_NS",
"XML_XPATH_EXPRESSION_OK", "XML_XPATH_NUMBER_ERROR", "XML_XPATH_UNFINISHED_LITERAL_ERROR",
"XML_XPATH_START_LITERAL_ERROR", "XML_XPATH_VARIABLE_REF_ERROR",
"XML_XPATH_UNDEF_VARIABLE_ERROR", "XML_XPATH_INVALID_PREDICATE_ERROR",
"XML_XPATH_EXPR_ERROR", "XML_XPATH_UNCLOSED_ERROR", "XML_XPATH_UNKNOWN_FUNC_ERROR",
"XML_XPATH_INVALID_OPERAND", "XML_XPATH_INVALID_TYPE", "XML_XPATH_INVALID_ARITY",
"XML_XPATH_INVALID_CTXT_SIZE", "XML_XPATH_INVALID_CTXT_POSITION",
"XML_XPATH_MEMORY_ERROR", "XML_XPTR_SYNTAX_ERROR", "XML_XPTR_RESOURCE_ERROR",
"XML_XPTR_SUB_RESOURCE_ERROR", "XML_XPATH_UNDEF_PREFIX_ERROR",
"XML_XPATH_ENCODING_ERROR", "XML_XPATH_INVALID_CHAR_ERROR", "XML_TREE_INVALID_HEX",
"XML_TREE_INVALID_DEC", "XML_TREE_UNTERMINATED_ENTITY", "XML_TREE_NOT_UTF8",
"XML_SAVE_NOT_UTF8", "XML_SAVE_CHAR_INVALID", "XML_SAVE_NO_DOCTYPE",
"XML_SAVE_UNKNOWN_ENCODING", "XML_REGEXP_COMPILE_ERROR", "XML_IO_UNKNOWN",
"XML_IO_EACCES", "XML_IO_EAGAIN", "XML_IO_EBADF", "XML_IO_EBADMSG",
"XML_IO_EBUSY", "XML_IO_ECANCELED", "XML_IO_ECHILD", "XML_IO_EDEADLK",
"XML_IO_EDOM", "XML_IO_EEXIST", "XML_IO_EFAULT", "XML_IO_EFBIG",
"XML_IO_EINPROGRESS", "XML_IO_EINTR", "XML_IO_EINVAL", "XML_IO_EIO",
"XML_IO_EISDIR", "XML_IO_EMFILE", "XML_IO_EMLINK", "XML_IO_EMSGSIZE",
"XML_IO_ENAMETOOLONG", "XML_IO_ENFILE", "XML_IO_ENODEV", "XML_IO_ENOENT",
"XML_IO_ENOEXEC", "XML_IO_ENOLCK", "XML_IO_ENOMEM", "XML_IO_ENOSPC",
"XML_IO_ENOSYS", "XML_IO_ENOTDIR", "XML_IO_ENOTEMPTY", "XML_IO_ENOTSUP",
"XML_IO_ENOTTY", "XML_IO_ENXIO", "XML_IO_EPERM", "XML_IO_EPIPE",
"XML_IO_ERANGE", "XML_IO_EROFS", "XML_IO_ESPIPE", "XML_IO_ESRCH",
"XML_IO_ETIMEDOUT", "XML_IO_EXDEV", "XML_IO_NETWORK_ATTEMPT",
"XML_IO_ENCODER", "XML_IO_FLUSH", "XML_IO_WRITE", "XML_IO_NO_INPUT",
"XML_IO_BUFFER_FULL", "XML_IO_LOAD_ERROR", "XML_IO_ENOTSOCK",
"XML_IO_EISCONN", "XML_IO_ECONNREFUSED", "XML_IO_ENETUNREACH",
"XML_IO_EADDRINUSE", "XML_IO_EALREADY", "XML_IO_EAFNOSUPPORT",
"XML_XINCLUDE_RECURSION", "XML_XINCLUDE_PARSE_VALUE", "XML_XINCLUDE_ENTITY_DEF_MISMATCH",
"XML_XINCLUDE_NO_HREF", "XML_XINCLUDE_NO_FALLBACK", "XML_XINCLUDE_HREF_URI",
"XML_XINCLUDE_TEXT_FRAGMENT", "XML_XINCLUDE_TEXT_DOCUMENT", "XML_XINCLUDE_INVALID_CHAR",
"XML_XINCLUDE_BUILD_FAILED", "XML_XINCLUDE_UNKNOWN_ENCODING",
"XML_XINCLUDE_MULTIPLE_ROOT", "XML_XINCLUDE_XPTR_FAILED", "XML_XINCLUDE_XPTR_RESULT",
"XML_XINCLUDE_INCLUDE_IN_INCLUDE", "XML_XINCLUDE_FALLBACKS_IN_INCLUDE",
"XML_XINCLUDE_FALLBACK_NOT_IN_INCLUDE", "XML_XINCLUDE_DEPRECATED_NS",
"XML_XINCLUDE_FRAGMENT_ID", "XML_CATALOG_MISSING_ATTR", "XML_CATALOG_ENTRY_BROKEN",
"XML_CATALOG_PREFER_VALUE", "XML_CATALOG_NOT_CATALOG", "XML_CATALOG_RECURSION",
"XML_SCHEMAP_PREFIX_UNDEFINED", "XML_SCHEMAP_ATTRFORMDEFAULT_VALUE",
"XML_SCHEMAP_ATTRGRP_NONAME_NOREF", "XML_SCHEMAP_ATTR_NONAME_NOREF",
"XML_SCHEMAP_COMPLEXTYPE_NONAME_NOREF", "XML_SCHEMAP_ELEMFORMDEFAULT_VALUE",
"XML_SCHEMAP_ELEM_NONAME_NOREF", "XML_SCHEMAP_EXTENSION_NO_BASE",
"XML_SCHEMAP_FACET_NO_VALUE", "XML_SCHEMAP_FAILED_BUILD_IMPORT",
"XML_SCHEMAP_GROUP_NONAME_NOREF", "XML_SCHEMAP_IMPORT_NAMESPACE_NOT_URI",
"XML_SCHEMAP_IMPORT_REDEFINE_NSNAME", "XML_SCHEMAP_IMPORT_SCHEMA_NOT_URI",
"XML_SCHEMAP_INVALID_BOOLEAN", "XML_SCHEMAP_INVALID_ENUM", "XML_SCHEMAP_INVALID_FACET",
"XML_SCHEMAP_INVALID_FACET_VALUE", "XML_SCHEMAP_INVALID_MAXOCCURS",
"XML_SCHEMAP_INVALID_MINOCCURS", "XML_SCHEMAP_INVALID_REF_AND_SUBTYPE",
"XML_SCHEMAP_INVALID_WHITE_SPACE", "XML_SCHEMAP_NOATTR_NOREF",
"XML_SCHEMAP_NOTATION_NO_NAME", "XML_SCHEMAP_NOTYPE_NOREF", "XML_SCHEMAP_REF_AND_SUBTYPE",
"XML_SCHEMAP_RESTRICTION_NONAME_NOREF", "XML_SCHEMAP_SIMPLETYPE_NONAME",
"XML_SCHEMAP_TYPE_AND_SUBTYPE", "XML_SCHEMAP_UNKNOWN_ALL_CHILD",
"XML_SCHEMAP_UNKNOWN_ANYATTRIBUTE_CHILD", "XML_SCHEMAP_UNKNOWN_ATTR_CHILD",
"XML_SCHEMAP_UNKNOWN_ATTRGRP_CHILD", "XML_SCHEMAP_UNKNOWN_ATTRIBUTE_GROUP",
"XML_SCHEMAP_UNKNOWN_BASE_TYPE", "XML_SCHEMAP_UNKNOWN_CHOICE_CHILD",
"XML_SCHEMAP_UNKNOWN_COMPLEXCONTENT_CHILD", "XML_SCHEMAP_UNKNOWN_COMPLEXTYPE_CHILD",
"XML_SCHEMAP_UNKNOWN_ELEM_CHILD", "XML_SCHEMAP_UNKNOWN_EXTENSION_CHILD",
"XML_SCHEMAP_UNKNOWN_FACET_CHILD", "XML_SCHEMAP_UNKNOWN_FACET_TYPE",
"XML_SCHEMAP_UNKNOWN_GROUP_CHILD", "XML_SCHEMAP_UNKNOWN_IMPORT_CHILD",
"XML_SCHEMAP_UNKNOWN_LIST_CHILD", "XML_SCHEMAP_UNKNOWN_NOTATION_CHILD",
"XML_SCHEMAP_UNKNOWN_PROCESSCONTENT_CHILD", "XML_SCHEMAP_UNKNOWN_REF",
"XML_SCHEMAP_UNKNOWN_RESTRICTION_CHILD", "XML_SCHEMAP_UNKNOWN_SCHEMAS_CHILD",
"XML_SCHEMAP_UNKNOWN_SEQUENCE_CHILD", "XML_SCHEMAP_UNKNOWN_SIMPLECONTENT_CHILD",
"XML_SCHEMAP_UNKNOWN_SIMPLETYPE_CHILD", "XML_SCHEMAP_UNKNOWN_TYPE",
"XML_SCHEMAP_UNKNOWN_UNION_CHILD", "XML_SCHEMAP_ELEM_DEFAULT_FIXED",
"XML_SCHEMAP_REGEXP_INVALID", "XML_SCHEMAP_FAILED_LOAD", "XML_SCHEMAP_NOTHING_TO_PARSE",
"XML_SCHEMAP_NOROOT", "XML_SCHEMAP_REDEFINED_GROUP", "XML_SCHEMAP_REDEFINED_TYPE",
"XML_SCHEMAP_REDEFINED_ELEMENT", "XML_SCHEMAP_REDEFINED_ATTRGROUP",
"XML_SCHEMAP_REDEFINED_ATTR", "XML_SCHEMAP_REDEFINED_NOTATION",
"XML_SCHEMAP_FAILED_PARSE", "XML_SCHEMAP_UNKNOWN_PREFIX", "XML_SCHEMAP_DEF_AND_PREFIX",
"XML_SCHEMAP_UNKNOWN_INCLUDE_CHILD", "XML_SCHEMAP_INCLUDE_SCHEMA_NOT_URI",
"XML_SCHEMAP_INCLUDE_SCHEMA_NO_URI", "XML_SCHEMAP_NOT_SCHEMA",
"XML_SCHEMAP_UNKNOWN_MEMBER_TYPE", "XML_SCHEMAP_INVALID_ATTR_USE",
"XML_SCHEMAP_RECURSIVE", "XML_SCHEMAP_SUPERNUMEROUS_LIST_ITEM_TYPE",
"XML_SCHEMAP_INVALID_ATTR_COMBINATION", "XML_SCHEMAP_INVALID_ATTR_INLINE_COMBINATION",
"XML_SCHEMAP_MISSING_SIMPLETYPE_CHILD", "XML_SCHEMAP_INVALID_ATTR_NAME",
"XML_SCHEMAP_REF_AND_CONTENT", "XML_SCHEMAP_CT_PROPS_CORRECT_1",
"XML_SCHEMAP_CT_PROPS_CORRECT_2", "XML_SCHEMAP_CT_PROPS_CORRECT_3",
"XML_SCHEMAP_CT_PROPS_CORRECT_4", "XML_SCHEMAP_CT_PROPS_CORRECT_5",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_1", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_1_1",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_1_2", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_2",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_3", "XML_SCHEMAP_WILDCARD_INVALID_NS_MEMBER",
"XML_SCHEMAP_INTERSECTION_NOT_EXPRESSIBLE", "XML_SCHEMAP_UNION_NOT_EXPRESSIBLE",
"XML_SCHEMAP_SRC_IMPORT_3_1", "XML_SCHEMAP_SRC_IMPORT_3_2", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_4_1",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_4_2", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_4_3",
"XML_SCHEMAP_COS_CT_EXTENDS_1_3", "XML_SCHEMAV_NOROOT", "XML_SCHEMAV_UNDECLAREDELEM",
"XML_SCHEMAV_NOTTOPLEVEL", "XML_SCHEMAV_MISSING", "XML_SCHEMAV_WRONGELEM",
"XML_SCHEMAV_NOTYPE", "XML_SCHEMAV_NOROLLBACK", "XML_SCHEMAV_ISABSTRACT",
"XML_SCHEMAV_NOTEMPTY", "XML_SCHEMAV_ELEMCONT", "XML_SCHEMAV_HAVEDEFAULT",
"XML_SCHEMAV_NOTNILLABLE", "XML_SCHEMAV_EXTRACONTENT", "XML_SCHEMAV_INVALIDATTR",
"XML_SCHEMAV_INVALIDELEM", "XML_SCHEMAV_NOTDETERMINIST", "XML_SCHEMAV_CONSTRUCT",
"XML_SCHEMAV_INTERNAL", "XML_SCHEMAV_NOTSIMPLE", "XML_SCHEMAV_ATTRUNKNOWN",
"XML_SCHEMAV_ATTRINVALID", "XML_SCHEMAV_VALUE", "XML_SCHEMAV_FACET",
"XML_SCHEMAV_CVC_DATATYPE_VALID_1_2_1", "XML_SCHEMAV_CVC_DATATYPE_VALID_1_2_2",
"XML_SCHEMAV_CVC_DATATYPE_VALID_1_2_3", "XML_SCHEMAV_CVC_TYPE_3_1_1",
"XML_SCHEMAV_CVC_TYPE_3_1_2", "XML_SCHEMAV_CVC_FACET_VALID",
"XML_SCHEMAV_CVC_LENGTH_VALID", "XML_SCHEMAV_CVC_MINLENGTH_VALID",
"XML_SCHEMAV_CVC_MAXLENGTH_VALID", "XML_SCHEMAV_CVC_MININCLUSIVE_VALID",
"XML_SCHEMAV_CVC_MAXINCLUSIVE_VALID", "XML_SCHEMAV_CVC_MINEXCLUSIVE_VALID",
"XML_SCHEMAV_CVC_MAXEXCLUSIVE_VALID", "XML_SCHEMAV_CVC_TOTALDIGITS_VALID",
"XML_SCHEMAV_CVC_FRACTIONDIGITS_VALID", "XML_SCHEMAV_CVC_PATTERN_VALID",
"XML_SCHEMAV_CVC_ENUMERATION_VALID", "XML_SCHEMAV_CVC_COMPLEX_TYPE_2_1",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_2_2", "XML_SCHEMAV_CVC_COMPLEX_TYPE_2_3",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_2_4", "XML_SCHEMAV_CVC_ELT_1",
"XML_SCHEMAV_CVC_ELT_2", "XML_SCHEMAV_CVC_ELT_3_1", "XML_SCHEMAV_CVC_ELT_3_2_1",
"XML_SCHEMAV_CVC_ELT_3_2_2", "XML_SCHEMAV_CVC_ELT_4_1", "XML_SCHEMAV_CVC_ELT_4_2",
"XML_SCHEMAV_CVC_ELT_4_3", "XML_SCHEMAV_CVC_ELT_5_1_1", "XML_SCHEMAV_CVC_ELT_5_1_2",
"XML_SCHEMAV_CVC_ELT_5_2_1", "XML_SCHEMAV_CVC_ELT_5_2_2_1", "XML_SCHEMAV_CVC_ELT_5_2_2_2_1",
"XML_SCHEMAV_CVC_ELT_5_2_2_2_2", "XML_SCHEMAV_CVC_ELT_6", "XML_SCHEMAV_CVC_ELT_7",
"XML_SCHEMAV_CVC_ATTRIBUTE_1", "XML_SCHEMAV_CVC_ATTRIBUTE_2",
"XML_SCHEMAV_CVC_ATTRIBUTE_3", "XML_SCHEMAV_CVC_ATTRIBUTE_4",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_3_1", "XML_SCHEMAV_CVC_COMPLEX_TYPE_3_2_1",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_3_2_2", "XML_SCHEMAV_CVC_COMPLEX_TYPE_4",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_5_1", "XML_SCHEMAV_CVC_COMPLEX_TYPE_5_2",
"XML_SCHEMAV_ELEMENT_CONTENT", "XML_SCHEMAV_DOCUMENT_ELEMENT_MISSING",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_1", "XML_SCHEMAV_CVC_AU", "XML_SCHEMAV_CVC_TYPE_1",
"XML_SCHEMAV_CVC_TYPE_2", "XML_SCHEMAV_CVC_IDC", "XML_SCHEMAV_CVC_WILDCARD",
"XML_SCHEMAV_MISC", "XML_XPTR_UNKNOWN_SCHEME", "XML_XPTR_CHILDSEQ_START",
"XML_XPTR_EVAL_FAILED", "XML_XPTR_EXTRA_OBJECTS", "XML_C14N_CREATE_CTXT",
"XML_C14N_REQUIRES_UTF8", "XML_C14N_CREATE_STACK", "XML_C14N_INVALID_NODE",
"XML_C14N_UNKNOW_NODE", "XML_C14N_RELATIVE_NAMESPACE", "XML_FTP_PASV_ANSWER",
"XML_FTP_EPSV_ANSWER", "XML_FTP_ACCNT", "XML_FTP_URL_SYNTAX",
"XML_HTTP_URL_SYNTAX", "XML_HTTP_USE_IP", "XML_HTTP_UNKNOWN_HOST",
"XML_SCHEMAP_SRC_SIMPLE_TYPE_1", "XML_SCHEMAP_SRC_SIMPLE_TYPE_2",
"XML_SCHEMAP_SRC_SIMPLE_TYPE_3", "XML_SCHEMAP_SRC_SIMPLE_TYPE_4",
"XML_SCHEMAP_SRC_RESOLVE", "XML_SCHEMAP_SRC_RESTRICTION_BASE_OR_SIMPLETYPE",
"XML_SCHEMAP_SRC_LIST_ITEMTYPE_OR_SIMPLETYPE", "XML_SCHEMAP_SRC_UNION_MEMBERTYPES_OR_SIMPLETYPES",
"XML_SCHEMAP_ST_PROPS_CORRECT_1", "XML_SCHEMAP_ST_PROPS_CORRECT_2",
"XML_SCHEMAP_ST_PROPS_CORRECT_3", "XML_SCHEMAP_COS_ST_RESTRICTS_1_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_1_2", "XML_SCHEMAP_COS_ST_RESTRICTS_1_3_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_1_3_2", "XML_SCHEMAP_COS_ST_RESTRICTS_2_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_1_1", "XML_SCHEMAP_COS_ST_RESTRICTS_2_3_1_2",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_1", "XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_2",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_3", "XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_4",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_5", "XML_SCHEMAP_COS_ST_RESTRICTS_3_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_1", "XML_SCHEMAP_COS_ST_RESTRICTS_3_3_1_2",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_2", "XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_3", "XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_4",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_5", "XML_SCHEMAP_COS_ST_DERIVED_OK_2_1",
"XML_SCHEMAP_COS_ST_DERIVED_OK_2_2", "XML_SCHEMAP_S4S_ELEM_NOT_ALLOWED",
"XML_SCHEMAP_S4S_ELEM_MISSING", "XML_SCHEMAP_S4S_ATTR_NOT_ALLOWED",
"XML_SCHEMAP_S4S_ATTR_MISSING", "XML_SCHEMAP_S4S_ATTR_INVALID_VALUE",
"XML_SCHEMAP_SRC_ELEMENT_1", "XML_SCHEMAP_SRC_ELEMENT_2_1", "XML_SCHEMAP_SRC_ELEMENT_2_2",
"XML_SCHEMAP_SRC_ELEMENT_3", "XML_SCHEMAP_P_PROPS_CORRECT_1",
"XML_SCHEMAP_P_PROPS_CORRECT_2_1", "XML_SCHEMAP_P_PROPS_CORRECT_2_2",
"XML_SCHEMAP_E_PROPS_CORRECT_2", "XML_SCHEMAP_E_PROPS_CORRECT_3",
"XML_SCHEMAP_E_PROPS_CORRECT_4", "XML_SCHEMAP_E_PROPS_CORRECT_5",
"XML_SCHEMAP_E_PROPS_CORRECT_6", "XML_SCHEMAP_SRC_INCLUDE", "XML_SCHEMAP_SRC_ATTRIBUTE_1",
"XML_SCHEMAP_SRC_ATTRIBUTE_2", "XML_SCHEMAP_SRC_ATTRIBUTE_3_1",
"XML_SCHEMAP_SRC_ATTRIBUTE_3_2", "XML_SCHEMAP_SRC_ATTRIBUTE_4",
"XML_SCHEMAP_NO_XMLNS", "XML_SCHEMAP_NO_XSI", "XML_SCHEMAP_COS_VALID_DEFAULT_1",
"XML_SCHEMAP_COS_VALID_DEFAULT_2_1", "XML_SCHEMAP_COS_VALID_DEFAULT_2_2_1",
"XML_SCHEMAP_COS_VALID_DEFAULT_2_2_2", "XML_SCHEMAP_CVC_SIMPLE_TYPE",
"XML_SCHEMAP_COS_CT_EXTENDS_1_1", "XML_SCHEMAP_SRC_IMPORT_1_1",
"XML_SCHEMAP_SRC_IMPORT_1_2", "XML_SCHEMAP_SRC_IMPORT_2", "XML_SCHEMAP_SRC_IMPORT_2_1",
"XML_SCHEMAP_SRC_IMPORT_2_2", "XML_SCHEMAP_INTERNAL", "XML_SCHEMAP_NOT_DETERMINISTIC",
"XML_SCHEMAP_SRC_ATTRIBUTE_GROUP_1", "XML_SCHEMAP_SRC_ATTRIBUTE_GROUP_2",
"XML_SCHEMAP_SRC_ATTRIBUTE_GROUP_3", "XML_SCHEMAP_MG_PROPS_CORRECT_1",
"XML_SCHEMAP_MG_PROPS_CORRECT_2", "XML_SCHEMAP_SRC_CT_1", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_1_3",
"XML_SCHEMAP_AU_PROPS_CORRECT_2", "XML_SCHEMAP_A_PROPS_CORRECT_2",
"XML_SCHEMAP_C_PROPS_CORRECT", "XML_SCHEMAP_SRC_REDEFINE", "XML_SCHEMAP_SRC_IMPORT",
"XML_SCHEMAP_WARN_SKIP_SCHEMA", "XML_SCHEMAP_WARN_UNLOCATED_SCHEMA",
"XML_SCHEMAP_WARN_ATTR_REDECL_PROH", "XML_SCHEMAP_WARN_ATTR_POINTLESS_PROH",
"XML_SCHEMAP_AG_PROPS_CORRECT", "XML_SCHEMAP_COS_CT_EXTENDS_1_2",
"XML_SCHEMAP_AU_PROPS_CORRECT", "XML_SCHEMAP_A_PROPS_CORRECT_3",
"XML_SCHEMAP_COS_ALL_LIMITED", "XML_SCHEMATRONV_ASSERT", "XML_SCHEMATRONV_REPORT",
"XML_MODULE_OPEN", "XML_MODULE_CLOSE", "XML_CHECK_FOUND_ELEMENT",
"XML_CHECK_FOUND_ATTRIBUTE", "XML_CHECK_FOUND_TEXT", "XML_CHECK_FOUND_CDATA",
"XML_CHECK_FOUND_ENTITYREF", "XML_CHECK_FOUND_ENTITY", "XML_CHECK_FOUND_PI",
"XML_CHECK_FOUND_COMMENT", "XML_CHECK_FOUND_DOCTYPE", "XML_CHECK_FOUND_FRAGMENT",
"XML_CHECK_FOUND_NOTATION", "XML_CHECK_UNKNOWN_NODE", "XML_CHECK_ENTITY_TYPE",
"XML_CHECK_NO_PARENT", "XML_CHECK_NO_DOC", "XML_CHECK_NO_NAME",
"XML_CHECK_NO_ELEM", "XML_CHECK_WRONG_DOC", "XML_CHECK_NO_PREV",
"XML_CHECK_WRONG_PREV", "XML_CHECK_NO_NEXT", "XML_CHECK_WRONG_NEXT",
"XML_CHECK_NOT_DTD", "XML_CHECK_NOT_ATTR", "XML_CHECK_NOT_ATTR_DECL",
"XML_CHECK_NOT_ELEM_DECL", "XML_CHECK_NOT_ENTITY_DECL", "XML_CHECK_NOT_NS_DECL",
"XML_CHECK_NO_HREF", "XML_CHECK_WRONG_PARENT", "XML_CHECK_NS_SCOPE",
"XML_CHECK_NS_ANCESTOR", "XML_CHECK_NOT_UTF8", "XML_CHECK_NO_DICT",
"XML_CHECK_NOT_NCNAME", "XML_CHECK_OUTSIDE_DICT", "XML_CHECK_WRONG_NAME",
"XML_CHECK_NAME_NOT_NULL", "XML_I18N_NO_NAME", "XML_I18N_NO_HANDLER",
"XML_I18N_EXCESS_HANDLER", "XML_I18N_CONV_FAILED", "XML_I18N_NO_OUTPUT"
))
XMLDomainErrors <-
structure(0:28, .Names = c("NONE", "PARSER", "TREE", "NAMESPACE",
"DTD", "HTML", "MEMORY", "OUTPUT", "IO", "FTP", "HTTP", "XINCLUDE",
"XPATH", "XPOINTER", "REGEXP", "DATATYPE", "SCHEMASP", "SCHEMASV",
"RELAXNGP", "RELAXNGV", "CATALOG", "C14N", "XSLT", "VALID", "CHECK",
"WRITER", "MODULE", "I18N", "SCHEMATRONV"))
XML/R/solrDocs.R 0000644 0001751 0000144 00000002472 13607633674 013070 0 ustar hornik users setGeneric("readSolrDoc",
function(doc, ...)
standardGeneric("readSolrDoc"))
setMethod("readSolrDoc", "character",
function(doc, ...)
readSolrDoc(xmlParse(doc), ...))
setMethod("readSolrDoc", "AsIs",
function(doc, ...)
readSolrDoc(xmlParse(doc), ...))
setMethod("readSolrDoc", "XMLInternalDocument",
function(doc, ...)
readSolrDoc(xmlRoot(doc), ...))
setMethod("readSolrDoc", "XMLInternalNode",
function(doc, ...) {
kids = xmlChildren(doc)
kids = kids[!sapply(kids, inherits, "XMLInternalCommentNode")]
if(length(kids) == 0)
return(list())
keys = sapply(kids, xmlGetAttr, "name")
structure(lapply(kids, readSolrNodeValue),
names = keys)
})
readSolrNodeValue =
function(node)
{
id = xmlName(node)
switch(id, int = if(abs(tmp <- as.numeric(xmlValue(node))) > .Machine$integer.max) tmp else as.integer(xmlValue(node)),
long = as.numeric(xmlValue(node)),
str = xmlValue(node),
lst = readSolrDoc(node),
bool = as.logical(xmlValue(node)),
date = as.POSIXct(strptime(xmlValue(node), "%Y-%m-%dT%H:%M:%SZ")),
)
}
XML/R/xmlErrorEnums.R 0000644 0001751 0000144 00000064645 13607633670 014130 0 ustar hornik users # Machine generated file. See org/omegahat/XML/RS/TU/tu.R
# Wed Feb 13 03:17:42 2008
xmlErrorLevel <-structure(0:3, .Names = c("XML_ERR_NONE", "XML_ERR_WARNING",
"XML_ERR_ERROR", "XML_ERR_FATAL"))
xmlErrorDomain <-structure(0:27, .Names = c("XML_FROM_NONE", "XML_FROM_PARSER",
"XML_FROM_TREE", "XML_FROM_NAMESPACE", "XML_FROM_DTD", "XML_FROM_HTML",
"XML_FROM_MEMORY", "XML_FROM_OUTPUT", "XML_FROM_IO", "XML_FROM_FTP",
"XML_FROM_HTTP", "XML_FROM_XINCLUDE", "XML_FROM_XPATH", "XML_FROM_XPOINTER",
"XML_FROM_REGEXP", "XML_FROM_DATATYPE", "XML_FROM_SCHEMASP",
"XML_FROM_SCHEMASV", "XML_FROM_RELAXNGP", "XML_FROM_RELAXNGV",
"XML_FROM_CATALOG", "XML_FROM_C14N", "XML_FROM_XSLT", "XML_FROM_VALID",
"XML_FROM_CHECK", "XML_FROM_WRITER", "XML_FROM_MODULE", "XML_FROM_I18N"
))
xmlParserErrors <-structure(c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L,
38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L,
51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L,
64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L,
77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L,
90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L,
102L, 103L, 104L, 105L, 106L, 107L, 200L, 201L, 202L, 203L, 204L,
500L, 501L, 502L, 503L, 504L, 505L, 506L, 507L, 508L, 509L, 510L,
511L, 512L, 513L, 514L, 515L, 516L, 517L, 518L, 519L, 520L, 521L,
522L, 523L, 524L, 525L, 526L, 527L, 528L, 529L, 530L, 531L, 532L,
533L, 534L, 535L, 536L, 537L, 538L, 539L, 540L, 800L, 801L, 1000L,
1001L, 1002L, 1003L, 1004L, 1005L, 1006L, 1007L, 1008L, 1009L,
1010L, 1011L, 1012L, 1013L, 1014L, 1015L, 1016L, 1017L, 1018L,
1019L, 1020L, 1021L, 1022L, 1023L, 1024L, 1025L, 1026L, 1027L,
1028L, 1029L, 1030L, 1031L, 1032L, 1033L, 1034L, 1035L, 1036L,
1037L, 1038L, 1039L, 1040L, 1041L, 1042L, 1043L, 1044L, 1045L,
1046L, 1047L, 1048L, 1049L, 1050L, 1051L, 1052L, 1053L, 1054L,
1055L, 1056L, 1057L, 1058L, 1059L, 1060L, 1061L, 1062L, 1063L,
1064L, 1065L, 1066L, 1067L, 1068L, 1069L, 1070L, 1071L, 1072L,
1073L, 1074L, 1075L, 1076L, 1077L, 1078L, 1079L, 1080L, 1081L,
1082L, 1083L, 1084L, 1085L, 1086L, 1087L, 1088L, 1089L, 1090L,
1091L, 1092L, 1093L, 1094L, 1095L, 1096L, 1097L, 1098L, 1099L,
1100L, 1101L, 1102L, 1103L, 1104L, 1105L, 1106L, 1107L, 1108L,
1109L, 1110L, 1111L, 1112L, 1113L, 1114L, 1115L, 1116L, 1117L,
1118L, 1119L, 1120L, 1121L, 1122L, 1200L, 1201L, 1202L, 1203L,
1204L, 1205L, 1206L, 1207L, 1208L, 1209L, 1210L, 1211L, 1212L,
1213L, 1214L, 1215L, 1216L, 1217L, 1218L, 1219L, 1220L, 1221L,
1300L, 1301L, 1302L, 1400L, 1401L, 1402L, 1403L, 1450L, 1500L,
1501L, 1502L, 1503L, 1504L, 1505L, 1506L, 1507L, 1508L, 1509L,
1510L, 1511L, 1512L, 1513L, 1514L, 1515L, 1516L, 1517L, 1518L,
1519L, 1520L, 1521L, 1522L, 1523L, 1524L, 1525L, 1526L, 1527L,
1528L, 1529L, 1530L, 1531L, 1532L, 1533L, 1534L, 1535L, 1536L,
1537L, 1538L, 1539L, 1540L, 1541L, 1542L, 1543L, 1544L, 1545L,
1546L, 1547L, 1548L, 1549L, 1550L, 1551L, 1552L, 1553L, 1554L,
1555L, 1556L, 1600L, 1601L, 1602L, 1603L, 1604L, 1605L, 1606L,
1607L, 1608L, 1609L, 1610L, 1611L, 1612L, 1613L, 1614L, 1615L,
1616L, 1617L, 1618L, 1650L, 1651L, 1652L, 1653L, 1654L, 1700L,
1701L, 1702L, 1703L, 1704L, 1705L, 1706L, 1707L, 1708L, 1709L,
1710L, 1711L, 1712L, 1713L, 1714L, 1715L, 1716L, 1717L, 1718L,
1719L, 1720L, 1721L, 1722L, 1723L, 1724L, 1725L, 1726L, 1727L,
1728L, 1729L, 1730L, 1731L, 1732L, 1733L, 1734L, 1735L, 1736L,
1737L, 1738L, 1739L, 1740L, 1741L, 1742L, 1743L, 1744L, 1745L,
1746L, 1747L, 1748L, 1749L, 1750L, 1751L, 1752L, 1753L, 1754L,
1755L, 1756L, 1757L, 1758L, 1759L, 1760L, 1761L, 1762L, 1763L,
1764L, 1765L, 1766L, 1767L, 1768L, 1769L, 1770L, 1771L, 1772L,
1773L, 1774L, 1775L, 1776L, 1777L, 1778L, 1779L, 1780L, 1781L,
1782L, 1783L, 1784L, 1785L, 1786L, 1787L, 1788L, 1789L, 1790L,
1791L, 1792L, 1793L, 1794L, 1795L, 1796L, 1797L, 1798L, 1799L,
1800L, 1801L, 1802L, 1803L, 1804L, 1805L, 1806L, 1807L, 1808L,
1809L, 1810L, 1811L, 1812L, 1813L, 1814L, 1815L, 1816L, 1817L,
1818L, 1819L, 1820L, 1821L, 1822L, 1823L, 1824L, 1825L, 1826L,
1827L, 1828L, 1829L, 1830L, 1831L, 1832L, 1833L, 1834L, 1835L,
1836L, 1837L, 1838L, 1839L, 1840L, 1841L, 1842L, 1843L, 1844L,
1845L, 1846L, 1847L, 1848L, 1849L, 1850L, 1851L, 1852L, 1853L,
1854L, 1855L, 1856L, 1857L, 1858L, 1859L, 1860L, 1861L, 1862L,
1863L, 1864L, 1865L, 1866L, 1867L, 1868L, 1869L, 1870L, 1871L,
1872L, 1873L, 1874L, 1875L, 1876L, 1877L, 1878L, 1879L, 1900L,
1901L, 1902L, 1903L, 1950L, 1951L, 1952L, 1953L, 1954L, 1955L,
2000L, 2001L, 2002L, 2003L, 2020L, 2021L, 2022L, 3000L, 3001L,
3002L, 3003L, 3004L, 3005L, 3006L, 3007L, 3008L, 3009L, 3010L,
3011L, 3012L, 3013L, 3014L, 3015L, 3016L, 3017L, 3018L, 3019L,
3020L, 3021L, 3022L, 3023L, 3024L, 3025L, 3026L, 3027L, 3028L,
3029L, 3030L, 3031L, 3032L, 3033L, 3034L, 3035L, 3036L, 3037L,
3038L, 3039L, 3040L, 3041L, 3042L, 3043L, 3044L, 3045L, 3046L,
3047L, 3048L, 3049L, 3050L, 3051L, 3052L, 3053L, 3054L, 3055L,
3056L, 3057L, 3058L, 3059L, 3060L, 3061L, 3062L, 3063L, 3064L,
3065L, 3066L, 3067L, 3068L, 3069L, 3070L, 3071L, 3072L, 3073L,
3074L, 3075L, 3076L, 3077L, 3078L, 3079L, 3080L, 3081L, 3082L,
3083L, 3084L, 3085L, 3086L, 3087L, 3088L, 3089L, 3090L, 3091L,
4900L, 4901L, 5000L, 5001L, 5002L, 5003L, 5004L, 5005L, 5006L,
5007L, 5008L, 5009L, 5010L, 5011L, 5012L, 5013L, 5014L, 5015L,
5016L, 5017L, 5018L, 5019L, 5020L, 5021L, 5022L, 5023L, 5024L,
5025L, 5026L, 5027L, 5028L, 5029L, 5030L, 5031L, 5032L, 5033L,
5034L, 5035L, 5036L, 5037L, 6000L, 6001L, 6002L, 6003L, 6004L
), .Names = c("XML_ERR_OK", "XML_ERR_INTERNAL_ERROR", "XML_ERR_NO_MEMORY",
"XML_ERR_DOCUMENT_START", "XML_ERR_DOCUMENT_EMPTY", "XML_ERR_DOCUMENT_END",
"XML_ERR_INVALID_HEX_CHARREF", "XML_ERR_INVALID_DEC_CHARREF",
"XML_ERR_INVALID_CHARREF", "XML_ERR_INVALID_CHAR", "XML_ERR_CHARREF_AT_EOF",
"XML_ERR_CHARREF_IN_PROLOG", "XML_ERR_CHARREF_IN_EPILOG", "XML_ERR_CHARREF_IN_DTD",
"XML_ERR_ENTITYREF_AT_EOF", "XML_ERR_ENTITYREF_IN_PROLOG", "XML_ERR_ENTITYREF_IN_EPILOG",
"XML_ERR_ENTITYREF_IN_DTD", "XML_ERR_PEREF_AT_EOF", "XML_ERR_PEREF_IN_PROLOG",
"XML_ERR_PEREF_IN_EPILOG", "XML_ERR_PEREF_IN_INT_SUBSET", "XML_ERR_ENTITYREF_NO_NAME",
"XML_ERR_ENTITYREF_SEMICOL_MISSING", "XML_ERR_PEREF_NO_NAME",
"XML_ERR_PEREF_SEMICOL_MISSING", "XML_ERR_UNDECLARED_ENTITY",
"XML_WAR_UNDECLARED_ENTITY", "XML_ERR_UNPARSED_ENTITY", "XML_ERR_ENTITY_IS_EXTERNAL",
"XML_ERR_ENTITY_IS_PARAMETER", "XML_ERR_UNKNOWN_ENCODING", "XML_ERR_UNSUPPORTED_ENCODING",
"XML_ERR_STRING_NOT_STARTED", "XML_ERR_STRING_NOT_CLOSED", "XML_ERR_NS_DECL_ERROR",
"XML_ERR_ENTITY_NOT_STARTED", "XML_ERR_ENTITY_NOT_FINISHED",
"XML_ERR_LT_IN_ATTRIBUTE", "XML_ERR_ATTRIBUTE_NOT_STARTED", "XML_ERR_ATTRIBUTE_NOT_FINISHED",
"XML_ERR_ATTRIBUTE_WITHOUT_VALUE", "XML_ERR_ATTRIBUTE_REDEFINED",
"XML_ERR_LITERAL_NOT_STARTED", "XML_ERR_LITERAL_NOT_FINISHED",
"XML_ERR_COMMENT_NOT_FINISHED", "XML_ERR_PI_NOT_STARTED", "XML_ERR_PI_NOT_FINISHED",
"XML_ERR_NOTATION_NOT_STARTED", "XML_ERR_NOTATION_NOT_FINISHED",
"XML_ERR_ATTLIST_NOT_STARTED", "XML_ERR_ATTLIST_NOT_FINISHED",
"XML_ERR_MIXED_NOT_STARTED", "XML_ERR_MIXED_NOT_FINISHED", "XML_ERR_ELEMCONTENT_NOT_STARTED",
"XML_ERR_ELEMCONTENT_NOT_FINISHED", "XML_ERR_XMLDECL_NOT_STARTED",
"XML_ERR_XMLDECL_NOT_FINISHED", "XML_ERR_CONDSEC_NOT_STARTED",
"XML_ERR_CONDSEC_NOT_FINISHED", "XML_ERR_EXT_SUBSET_NOT_FINISHED",
"XML_ERR_DOCTYPE_NOT_FINISHED", "XML_ERR_MISPLACED_CDATA_END",
"XML_ERR_CDATA_NOT_FINISHED", "XML_ERR_RESERVED_XML_NAME", "XML_ERR_SPACE_REQUIRED",
"XML_ERR_SEPARATOR_REQUIRED", "XML_ERR_NMTOKEN_REQUIRED", "XML_ERR_NAME_REQUIRED",
"XML_ERR_PCDATA_REQUIRED", "XML_ERR_URI_REQUIRED", "XML_ERR_PUBID_REQUIRED",
"XML_ERR_LT_REQUIRED", "XML_ERR_GT_REQUIRED", "XML_ERR_LTSLASH_REQUIRED",
"XML_ERR_EQUAL_REQUIRED", "XML_ERR_TAG_NAME_MISMATCH", "XML_ERR_TAG_NOT_FINISHED",
"XML_ERR_STANDALONE_VALUE", "XML_ERR_ENCODING_NAME", "XML_ERR_HYPHEN_IN_COMMENT",
"XML_ERR_INVALID_ENCODING", "XML_ERR_EXT_ENTITY_STANDALONE",
"XML_ERR_CONDSEC_INVALID", "XML_ERR_VALUE_REQUIRED", "XML_ERR_NOT_WELL_BALANCED",
"XML_ERR_EXTRA_CONTENT", "XML_ERR_ENTITY_CHAR_ERROR", "XML_ERR_ENTITY_PE_INTERNAL",
"XML_ERR_ENTITY_LOOP", "XML_ERR_ENTITY_BOUNDARY", "XML_ERR_INVALID_URI",
"XML_ERR_URI_FRAGMENT", "XML_WAR_CATALOG_PI", "XML_ERR_NO_DTD",
"XML_ERR_CONDSEC_INVALID_KEYWORD", "XML_ERR_VERSION_MISSING",
"XML_WAR_UNKNOWN_VERSION", "XML_WAR_LANG_VALUE", "XML_WAR_NS_URI",
"XML_WAR_NS_URI_RELATIVE", "XML_ERR_MISSING_ENCODING", "XML_WAR_SPACE_VALUE",
"XML_ERR_NOT_STANDALONE", "XML_ERR_ENTITY_PROCESSING", "XML_ERR_NOTATION_PROCESSING",
"XML_WAR_NS_COLUMN", "XML_WAR_ENTITY_REDEFINED", "XML_NS_ERR_XML_NAMESPACE",
"XML_NS_ERR_UNDEFINED_NAMESPACE", "XML_NS_ERR_QNAME", "XML_NS_ERR_ATTRIBUTE_REDEFINED",
"XML_NS_ERR_EMPTY", "XML_DTD_ATTRIBUTE_DEFAULT", "XML_DTD_ATTRIBUTE_REDEFINED",
"XML_DTD_ATTRIBUTE_VALUE", "XML_DTD_CONTENT_ERROR", "XML_DTD_CONTENT_MODEL",
"XML_DTD_CONTENT_NOT_DETERMINIST", "XML_DTD_DIFFERENT_PREFIX",
"XML_DTD_ELEM_DEFAULT_NAMESPACE", "XML_DTD_ELEM_NAMESPACE", "XML_DTD_ELEM_REDEFINED",
"XML_DTD_EMPTY_NOTATION", "XML_DTD_ENTITY_TYPE", "XML_DTD_ID_FIXED",
"XML_DTD_ID_REDEFINED", "XML_DTD_ID_SUBSET", "XML_DTD_INVALID_CHILD",
"XML_DTD_INVALID_DEFAULT", "XML_DTD_LOAD_ERROR", "XML_DTD_MISSING_ATTRIBUTE",
"XML_DTD_MIXED_CORRUPT", "XML_DTD_MULTIPLE_ID", "XML_DTD_NO_DOC",
"XML_DTD_NO_DTD", "XML_DTD_NO_ELEM_NAME", "XML_DTD_NO_PREFIX",
"XML_DTD_NO_ROOT", "XML_DTD_NOTATION_REDEFINED", "XML_DTD_NOTATION_VALUE",
"XML_DTD_NOT_EMPTY", "XML_DTD_NOT_PCDATA", "XML_DTD_NOT_STANDALONE",
"XML_DTD_ROOT_NAME", "XML_DTD_STANDALONE_WHITE_SPACE", "XML_DTD_UNKNOWN_ATTRIBUTE",
"XML_DTD_UNKNOWN_ELEM", "XML_DTD_UNKNOWN_ENTITY", "XML_DTD_UNKNOWN_ID",
"XML_DTD_UNKNOWN_NOTATION", "XML_DTD_STANDALONE_DEFAULTED", "XML_DTD_XMLID_VALUE",
"XML_DTD_XMLID_TYPE", "XML_HTML_STRUCURE_ERROR", "XML_HTML_UNKNOWN_TAG",
"XML_RNGP_ANYNAME_ATTR_ANCESTOR", "XML_RNGP_ATTR_CONFLICT", "XML_RNGP_ATTRIBUTE_CHILDREN",
"XML_RNGP_ATTRIBUTE_CONTENT", "XML_RNGP_ATTRIBUTE_EMPTY", "XML_RNGP_ATTRIBUTE_NOOP",
"XML_RNGP_CHOICE_CONTENT", "XML_RNGP_CHOICE_EMPTY", "XML_RNGP_CREATE_FAILURE",
"XML_RNGP_DATA_CONTENT", "XML_RNGP_DEF_CHOICE_AND_INTERLEAVE",
"XML_RNGP_DEFINE_CREATE_FAILED", "XML_RNGP_DEFINE_EMPTY", "XML_RNGP_DEFINE_MISSING",
"XML_RNGP_DEFINE_NAME_MISSING", "XML_RNGP_ELEM_CONTENT_EMPTY",
"XML_RNGP_ELEM_CONTENT_ERROR", "XML_RNGP_ELEMENT_EMPTY", "XML_RNGP_ELEMENT_CONTENT",
"XML_RNGP_ELEMENT_NAME", "XML_RNGP_ELEMENT_NO_CONTENT", "XML_RNGP_ELEM_TEXT_CONFLICT",
"XML_RNGP_EMPTY", "XML_RNGP_EMPTY_CONSTRUCT", "XML_RNGP_EMPTY_CONTENT",
"XML_RNGP_EMPTY_NOT_EMPTY", "XML_RNGP_ERROR_TYPE_LIB", "XML_RNGP_EXCEPT_EMPTY",
"XML_RNGP_EXCEPT_MISSING", "XML_RNGP_EXCEPT_MULTIPLE", "XML_RNGP_EXCEPT_NO_CONTENT",
"XML_RNGP_EXTERNALREF_EMTPY", "XML_RNGP_EXTERNAL_REF_FAILURE",
"XML_RNGP_EXTERNALREF_RECURSE", "XML_RNGP_FORBIDDEN_ATTRIBUTE",
"XML_RNGP_FOREIGN_ELEMENT", "XML_RNGP_GRAMMAR_CONTENT", "XML_RNGP_GRAMMAR_EMPTY",
"XML_RNGP_GRAMMAR_MISSING", "XML_RNGP_GRAMMAR_NO_START", "XML_RNGP_GROUP_ATTR_CONFLICT",
"XML_RNGP_HREF_ERROR", "XML_RNGP_INCLUDE_EMPTY", "XML_RNGP_INCLUDE_FAILURE",
"XML_RNGP_INCLUDE_RECURSE", "XML_RNGP_INTERLEAVE_ADD", "XML_RNGP_INTERLEAVE_CREATE_FAILED",
"XML_RNGP_INTERLEAVE_EMPTY", "XML_RNGP_INTERLEAVE_NO_CONTENT",
"XML_RNGP_INVALID_DEFINE_NAME", "XML_RNGP_INVALID_URI", "XML_RNGP_INVALID_VALUE",
"XML_RNGP_MISSING_HREF", "XML_RNGP_NAME_MISSING", "XML_RNGP_NEED_COMBINE",
"XML_RNGP_NOTALLOWED_NOT_EMPTY", "XML_RNGP_NSNAME_ATTR_ANCESTOR",
"XML_RNGP_NSNAME_NO_NS", "XML_RNGP_PARAM_FORBIDDEN", "XML_RNGP_PARAM_NAME_MISSING",
"XML_RNGP_PARENTREF_CREATE_FAILED", "XML_RNGP_PARENTREF_NAME_INVALID",
"XML_RNGP_PARENTREF_NO_NAME", "XML_RNGP_PARENTREF_NO_PARENT",
"XML_RNGP_PARENTREF_NOT_EMPTY", "XML_RNGP_PARSE_ERROR", "XML_RNGP_PAT_ANYNAME_EXCEPT_ANYNAME",
"XML_RNGP_PAT_ATTR_ATTR", "XML_RNGP_PAT_ATTR_ELEM", "XML_RNGP_PAT_DATA_EXCEPT_ATTR",
"XML_RNGP_PAT_DATA_EXCEPT_ELEM", "XML_RNGP_PAT_DATA_EXCEPT_EMPTY",
"XML_RNGP_PAT_DATA_EXCEPT_GROUP", "XML_RNGP_PAT_DATA_EXCEPT_INTERLEAVE",
"XML_RNGP_PAT_DATA_EXCEPT_LIST", "XML_RNGP_PAT_DATA_EXCEPT_ONEMORE",
"XML_RNGP_PAT_DATA_EXCEPT_REF", "XML_RNGP_PAT_DATA_EXCEPT_TEXT",
"XML_RNGP_PAT_LIST_ATTR", "XML_RNGP_PAT_LIST_ELEM", "XML_RNGP_PAT_LIST_INTERLEAVE",
"XML_RNGP_PAT_LIST_LIST", "XML_RNGP_PAT_LIST_REF", "XML_RNGP_PAT_LIST_TEXT",
"XML_RNGP_PAT_NSNAME_EXCEPT_ANYNAME", "XML_RNGP_PAT_NSNAME_EXCEPT_NSNAME",
"XML_RNGP_PAT_ONEMORE_GROUP_ATTR", "XML_RNGP_PAT_ONEMORE_INTERLEAVE_ATTR",
"XML_RNGP_PAT_START_ATTR", "XML_RNGP_PAT_START_DATA", "XML_RNGP_PAT_START_EMPTY",
"XML_RNGP_PAT_START_GROUP", "XML_RNGP_PAT_START_INTERLEAVE",
"XML_RNGP_PAT_START_LIST", "XML_RNGP_PAT_START_ONEMORE", "XML_RNGP_PAT_START_TEXT",
"XML_RNGP_PAT_START_VALUE", "XML_RNGP_PREFIX_UNDEFINED", "XML_RNGP_REF_CREATE_FAILED",
"XML_RNGP_REF_CYCLE", "XML_RNGP_REF_NAME_INVALID", "XML_RNGP_REF_NO_DEF",
"XML_RNGP_REF_NO_NAME", "XML_RNGP_REF_NOT_EMPTY", "XML_RNGP_START_CHOICE_AND_INTERLEAVE",
"XML_RNGP_START_CONTENT", "XML_RNGP_START_EMPTY", "XML_RNGP_START_MISSING",
"XML_RNGP_TEXT_EXPECTED", "XML_RNGP_TEXT_HAS_CHILD", "XML_RNGP_TYPE_MISSING",
"XML_RNGP_TYPE_NOT_FOUND", "XML_RNGP_TYPE_VALUE", "XML_RNGP_UNKNOWN_ATTRIBUTE",
"XML_RNGP_UNKNOWN_COMBINE", "XML_RNGP_UNKNOWN_CONSTRUCT", "XML_RNGP_UNKNOWN_TYPE_LIB",
"XML_RNGP_URI_FRAGMENT", "XML_RNGP_URI_NOT_ABSOLUTE", "XML_RNGP_VALUE_EMPTY",
"XML_RNGP_VALUE_NO_CONTENT", "XML_RNGP_XMLNS_NAME", "XML_RNGP_XML_NS",
"XML_XPATH_EXPRESSION_OK", "XML_XPATH_NUMBER_ERROR", "XML_XPATH_UNFINISHED_LITERAL_ERROR",
"XML_XPATH_START_LITERAL_ERROR", "XML_XPATH_VARIABLE_REF_ERROR",
"XML_XPATH_UNDEF_VARIABLE_ERROR", "XML_XPATH_INVALID_PREDICATE_ERROR",
"XML_XPATH_EXPR_ERROR", "XML_XPATH_UNCLOSED_ERROR", "XML_XPATH_UNKNOWN_FUNC_ERROR",
"XML_XPATH_INVALID_OPERAND", "XML_XPATH_INVALID_TYPE", "XML_XPATH_INVALID_ARITY",
"XML_XPATH_INVALID_CTXT_SIZE", "XML_XPATH_INVALID_CTXT_POSITION",
"XML_XPATH_MEMORY_ERROR", "XML_XPTR_SYNTAX_ERROR", "XML_XPTR_RESOURCE_ERROR",
"XML_XPTR_SUB_RESOURCE_ERROR", "XML_XPATH_UNDEF_PREFIX_ERROR",
"XML_XPATH_ENCODING_ERROR", "XML_XPATH_INVALID_CHAR_ERROR", "XML_TREE_INVALID_HEX",
"XML_TREE_INVALID_DEC", "XML_TREE_UNTERMINATED_ENTITY", "XML_SAVE_NOT_UTF8",
"XML_SAVE_CHAR_INVALID", "XML_SAVE_NO_DOCTYPE", "XML_SAVE_UNKNOWN_ENCODING",
"XML_REGEXP_COMPILE_ERROR", "XML_IO_UNKNOWN", "XML_IO_EACCES",
"XML_IO_EAGAIN", "XML_IO_EBADF", "XML_IO_EBADMSG", "XML_IO_EBUSY",
"XML_IO_ECANCELED", "XML_IO_ECHILD", "XML_IO_EDEADLK", "XML_IO_EDOM",
"XML_IO_EEXIST", "XML_IO_EFAULT", "XML_IO_EFBIG", "XML_IO_EINPROGRESS",
"XML_IO_EINTR", "XML_IO_EINVAL", "XML_IO_EIO", "XML_IO_EISDIR",
"XML_IO_EMFILE", "XML_IO_EMLINK", "XML_IO_EMSGSIZE", "XML_IO_ENAMETOOLONG",
"XML_IO_ENFILE", "XML_IO_ENODEV", "XML_IO_ENOENT", "XML_IO_ENOEXEC",
"XML_IO_ENOLCK", "XML_IO_ENOMEM", "XML_IO_ENOSPC", "XML_IO_ENOSYS",
"XML_IO_ENOTDIR", "XML_IO_ENOTEMPTY", "XML_IO_ENOTSUP", "XML_IO_ENOTTY",
"XML_IO_ENXIO", "XML_IO_EPERM", "XML_IO_EPIPE", "XML_IO_ERANGE",
"XML_IO_EROFS", "XML_IO_ESPIPE", "XML_IO_ESRCH", "XML_IO_ETIMEDOUT",
"XML_IO_EXDEV", "XML_IO_NETWORK_ATTEMPT", "XML_IO_ENCODER", "XML_IO_FLUSH",
"XML_IO_WRITE", "XML_IO_NO_INPUT", "XML_IO_BUFFER_FULL", "XML_IO_LOAD_ERROR",
"XML_IO_ENOTSOCK", "XML_IO_EISCONN", "XML_IO_ECONNREFUSED", "XML_IO_ENETUNREACH",
"XML_IO_EADDRINUSE", "XML_IO_EALREADY", "XML_IO_EAFNOSUPPORT",
"XML_XINCLUDE_RECURSION", "XML_XINCLUDE_PARSE_VALUE", "XML_XINCLUDE_ENTITY_DEF_MISMATCH",
"XML_XINCLUDE_NO_HREF", "XML_XINCLUDE_NO_FALLBACK", "XML_XINCLUDE_HREF_URI",
"XML_XINCLUDE_TEXT_FRAGMENT", "XML_XINCLUDE_TEXT_DOCUMENT", "XML_XINCLUDE_INVALID_CHAR",
"XML_XINCLUDE_BUILD_FAILED", "XML_XINCLUDE_UNKNOWN_ENCODING",
"XML_XINCLUDE_MULTIPLE_ROOT", "XML_XINCLUDE_XPTR_FAILED", "XML_XINCLUDE_XPTR_RESULT",
"XML_XINCLUDE_INCLUDE_IN_INCLUDE", "XML_XINCLUDE_FALLBACKS_IN_INCLUDE",
"XML_XINCLUDE_FALLBACK_NOT_IN_INCLUDE", "XML_XINCLUDE_DEPRECATED_NS",
"XML_XINCLUDE_FRAGMENT_ID", "XML_CATALOG_MISSING_ATTR", "XML_CATALOG_ENTRY_BROKEN",
"XML_CATALOG_PREFER_VALUE", "XML_CATALOG_NOT_CATALOG", "XML_CATALOG_RECURSION",
"XML_SCHEMAP_PREFIX_UNDEFINED", "XML_SCHEMAP_ATTRFORMDEFAULT_VALUE",
"XML_SCHEMAP_ATTRGRP_NONAME_NOREF", "XML_SCHEMAP_ATTR_NONAME_NOREF",
"XML_SCHEMAP_COMPLEXTYPE_NONAME_NOREF", "XML_SCHEMAP_ELEMFORMDEFAULT_VALUE",
"XML_SCHEMAP_ELEM_NONAME_NOREF", "XML_SCHEMAP_EXTENSION_NO_BASE",
"XML_SCHEMAP_FACET_NO_VALUE", "XML_SCHEMAP_FAILED_BUILD_IMPORT",
"XML_SCHEMAP_GROUP_NONAME_NOREF", "XML_SCHEMAP_IMPORT_NAMESPACE_NOT_URI",
"XML_SCHEMAP_IMPORT_REDEFINE_NSNAME", "XML_SCHEMAP_IMPORT_SCHEMA_NOT_URI",
"XML_SCHEMAP_INVALID_BOOLEAN", "XML_SCHEMAP_INVALID_ENUM", "XML_SCHEMAP_INVALID_FACET",
"XML_SCHEMAP_INVALID_FACET_VALUE", "XML_SCHEMAP_INVALID_MAXOCCURS",
"XML_SCHEMAP_INVALID_MINOCCURS", "XML_SCHEMAP_INVALID_REF_AND_SUBTYPE",
"XML_SCHEMAP_INVALID_WHITE_SPACE", "XML_SCHEMAP_NOATTR_NOREF",
"XML_SCHEMAP_NOTATION_NO_NAME", "XML_SCHEMAP_NOTYPE_NOREF", "XML_SCHEMAP_REF_AND_SUBTYPE",
"XML_SCHEMAP_RESTRICTION_NONAME_NOREF", "XML_SCHEMAP_SIMPLETYPE_NONAME",
"XML_SCHEMAP_TYPE_AND_SUBTYPE", "XML_SCHEMAP_UNKNOWN_ALL_CHILD",
"XML_SCHEMAP_UNKNOWN_ANYATTRIBUTE_CHILD", "XML_SCHEMAP_UNKNOWN_ATTR_CHILD",
"XML_SCHEMAP_UNKNOWN_ATTRGRP_CHILD", "XML_SCHEMAP_UNKNOWN_ATTRIBUTE_GROUP",
"XML_SCHEMAP_UNKNOWN_BASE_TYPE", "XML_SCHEMAP_UNKNOWN_CHOICE_CHILD",
"XML_SCHEMAP_UNKNOWN_COMPLEXCONTENT_CHILD", "XML_SCHEMAP_UNKNOWN_COMPLEXTYPE_CHILD",
"XML_SCHEMAP_UNKNOWN_ELEM_CHILD", "XML_SCHEMAP_UNKNOWN_EXTENSION_CHILD",
"XML_SCHEMAP_UNKNOWN_FACET_CHILD", "XML_SCHEMAP_UNKNOWN_FACET_TYPE",
"XML_SCHEMAP_UNKNOWN_GROUP_CHILD", "XML_SCHEMAP_UNKNOWN_IMPORT_CHILD",
"XML_SCHEMAP_UNKNOWN_LIST_CHILD", "XML_SCHEMAP_UNKNOWN_NOTATION_CHILD",
"XML_SCHEMAP_UNKNOWN_PROCESSCONTENT_CHILD", "XML_SCHEMAP_UNKNOWN_REF",
"XML_SCHEMAP_UNKNOWN_RESTRICTION_CHILD", "XML_SCHEMAP_UNKNOWN_SCHEMAS_CHILD",
"XML_SCHEMAP_UNKNOWN_SEQUENCE_CHILD", "XML_SCHEMAP_UNKNOWN_SIMPLECONTENT_CHILD",
"XML_SCHEMAP_UNKNOWN_SIMPLETYPE_CHILD", "XML_SCHEMAP_UNKNOWN_TYPE",
"XML_SCHEMAP_UNKNOWN_UNION_CHILD", "XML_SCHEMAP_ELEM_DEFAULT_FIXED",
"XML_SCHEMAP_REGEXP_INVALID", "XML_SCHEMAP_FAILED_LOAD", "XML_SCHEMAP_NOTHING_TO_PARSE",
"XML_SCHEMAP_NOROOT", "XML_SCHEMAP_REDEFINED_GROUP", "XML_SCHEMAP_REDEFINED_TYPE",
"XML_SCHEMAP_REDEFINED_ELEMENT", "XML_SCHEMAP_REDEFINED_ATTRGROUP",
"XML_SCHEMAP_REDEFINED_ATTR", "XML_SCHEMAP_REDEFINED_NOTATION",
"XML_SCHEMAP_FAILED_PARSE", "XML_SCHEMAP_UNKNOWN_PREFIX", "XML_SCHEMAP_DEF_AND_PREFIX",
"XML_SCHEMAP_UNKNOWN_INCLUDE_CHILD", "XML_SCHEMAP_INCLUDE_SCHEMA_NOT_URI",
"XML_SCHEMAP_INCLUDE_SCHEMA_NO_URI", "XML_SCHEMAP_NOT_SCHEMA",
"XML_SCHEMAP_UNKNOWN_MEMBER_TYPE", "XML_SCHEMAP_INVALID_ATTR_USE",
"XML_SCHEMAP_RECURSIVE", "XML_SCHEMAP_SUPERNUMEROUS_LIST_ITEM_TYPE",
"XML_SCHEMAP_INVALID_ATTR_COMBINATION", "XML_SCHEMAP_INVALID_ATTR_INLINE_COMBINATION",
"XML_SCHEMAP_MISSING_SIMPLETYPE_CHILD", "XML_SCHEMAP_INVALID_ATTR_NAME",
"XML_SCHEMAP_REF_AND_CONTENT", "XML_SCHEMAP_CT_PROPS_CORRECT_1",
"XML_SCHEMAP_CT_PROPS_CORRECT_2", "XML_SCHEMAP_CT_PROPS_CORRECT_3",
"XML_SCHEMAP_CT_PROPS_CORRECT_4", "XML_SCHEMAP_CT_PROPS_CORRECT_5",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_1", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_1_1",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_1_2", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_2",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_3", "XML_SCHEMAP_WILDCARD_INVALID_NS_MEMBER",
"XML_SCHEMAP_INTERSECTION_NOT_EXPRESSIBLE", "XML_SCHEMAP_UNION_NOT_EXPRESSIBLE",
"XML_SCHEMAP_SRC_IMPORT_3_1", "XML_SCHEMAP_SRC_IMPORT_3_2", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_4_1",
"XML_SCHEMAP_DERIVATION_OK_RESTRICTION_4_2", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_4_3",
"XML_SCHEMAP_COS_CT_EXTENDS_1_3", "XML_SCHEMAV_NOROOT", "XML_SCHEMAV_UNDECLAREDELEM",
"XML_SCHEMAV_NOTTOPLEVEL", "XML_SCHEMAV_MISSING", "XML_SCHEMAV_WRONGELEM",
"XML_SCHEMAV_NOTYPE", "XML_SCHEMAV_NOROLLBACK", "XML_SCHEMAV_ISABSTRACT",
"XML_SCHEMAV_NOTEMPTY", "XML_SCHEMAV_ELEMCONT", "XML_SCHEMAV_HAVEDEFAULT",
"XML_SCHEMAV_NOTNILLABLE", "XML_SCHEMAV_EXTRACONTENT", "XML_SCHEMAV_INVALIDATTR",
"XML_SCHEMAV_INVALIDELEM", "XML_SCHEMAV_NOTDETERMINIST", "XML_SCHEMAV_CONSTRUCT",
"XML_SCHEMAV_INTERNAL", "XML_SCHEMAV_NOTSIMPLE", "XML_SCHEMAV_ATTRUNKNOWN",
"XML_SCHEMAV_ATTRINVALID", "XML_SCHEMAV_VALUE", "XML_SCHEMAV_FACET",
"XML_SCHEMAV_CVC_DATATYPE_VALID_1_2_1", "XML_SCHEMAV_CVC_DATATYPE_VALID_1_2_2",
"XML_SCHEMAV_CVC_DATATYPE_VALID_1_2_3", "XML_SCHEMAV_CVC_TYPE_3_1_1",
"XML_SCHEMAV_CVC_TYPE_3_1_2", "XML_SCHEMAV_CVC_FACET_VALID",
"XML_SCHEMAV_CVC_LENGTH_VALID", "XML_SCHEMAV_CVC_MINLENGTH_VALID",
"XML_SCHEMAV_CVC_MAXLENGTH_VALID", "XML_SCHEMAV_CVC_MININCLUSIVE_VALID",
"XML_SCHEMAV_CVC_MAXINCLUSIVE_VALID", "XML_SCHEMAV_CVC_MINEXCLUSIVE_VALID",
"XML_SCHEMAV_CVC_MAXEXCLUSIVE_VALID", "XML_SCHEMAV_CVC_TOTALDIGITS_VALID",
"XML_SCHEMAV_CVC_FRACTIONDIGITS_VALID", "XML_SCHEMAV_CVC_PATTERN_VALID",
"XML_SCHEMAV_CVC_ENUMERATION_VALID", "XML_SCHEMAV_CVC_COMPLEX_TYPE_2_1",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_2_2", "XML_SCHEMAV_CVC_COMPLEX_TYPE_2_3",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_2_4", "XML_SCHEMAV_CVC_ELT_1",
"XML_SCHEMAV_CVC_ELT_2", "XML_SCHEMAV_CVC_ELT_3_1", "XML_SCHEMAV_CVC_ELT_3_2_1",
"XML_SCHEMAV_CVC_ELT_3_2_2", "XML_SCHEMAV_CVC_ELT_4_1", "XML_SCHEMAV_CVC_ELT_4_2",
"XML_SCHEMAV_CVC_ELT_4_3", "XML_SCHEMAV_CVC_ELT_5_1_1", "XML_SCHEMAV_CVC_ELT_5_1_2",
"XML_SCHEMAV_CVC_ELT_5_2_1", "XML_SCHEMAV_CVC_ELT_5_2_2_1", "XML_SCHEMAV_CVC_ELT_5_2_2_2_1",
"XML_SCHEMAV_CVC_ELT_5_2_2_2_2", "XML_SCHEMAV_CVC_ELT_6", "XML_SCHEMAV_CVC_ELT_7",
"XML_SCHEMAV_CVC_ATTRIBUTE_1", "XML_SCHEMAV_CVC_ATTRIBUTE_2",
"XML_SCHEMAV_CVC_ATTRIBUTE_3", "XML_SCHEMAV_CVC_ATTRIBUTE_4",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_3_1", "XML_SCHEMAV_CVC_COMPLEX_TYPE_3_2_1",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_3_2_2", "XML_SCHEMAV_CVC_COMPLEX_TYPE_4",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_5_1", "XML_SCHEMAV_CVC_COMPLEX_TYPE_5_2",
"XML_SCHEMAV_ELEMENT_CONTENT", "XML_SCHEMAV_DOCUMENT_ELEMENT_MISSING",
"XML_SCHEMAV_CVC_COMPLEX_TYPE_1", "XML_SCHEMAV_CVC_AU", "XML_SCHEMAV_CVC_TYPE_1",
"XML_SCHEMAV_CVC_TYPE_2", "XML_SCHEMAV_CVC_IDC", "XML_SCHEMAV_CVC_WILDCARD",
"XML_SCHEMAV_MISC", "XML_XPTR_UNKNOWN_SCHEME", "XML_XPTR_CHILDSEQ_START",
"XML_XPTR_EVAL_FAILED", "XML_XPTR_EXTRA_OBJECTS", "XML_C14N_CREATE_CTXT",
"XML_C14N_REQUIRES_UTF8", "XML_C14N_CREATE_STACK", "XML_C14N_INVALID_NODE",
"XML_C14N_UNKNOW_NODE", "XML_C14N_RELATIVE_NAMESPACE", "XML_FTP_PASV_ANSWER",
"XML_FTP_EPSV_ANSWER", "XML_FTP_ACCNT", "XML_FTP_URL_SYNTAX",
"XML_HTTP_URL_SYNTAX", "XML_HTTP_USE_IP", "XML_HTTP_UNKNOWN_HOST",
"XML_SCHEMAP_SRC_SIMPLE_TYPE_1", "XML_SCHEMAP_SRC_SIMPLE_TYPE_2",
"XML_SCHEMAP_SRC_SIMPLE_TYPE_3", "XML_SCHEMAP_SRC_SIMPLE_TYPE_4",
"XML_SCHEMAP_SRC_RESOLVE", "XML_SCHEMAP_SRC_RESTRICTION_BASE_OR_SIMPLETYPE",
"XML_SCHEMAP_SRC_LIST_ITEMTYPE_OR_SIMPLETYPE", "XML_SCHEMAP_SRC_UNION_MEMBERTYPES_OR_SIMPLETYPES",
"XML_SCHEMAP_ST_PROPS_CORRECT_1", "XML_SCHEMAP_ST_PROPS_CORRECT_2",
"XML_SCHEMAP_ST_PROPS_CORRECT_3", "XML_SCHEMAP_COS_ST_RESTRICTS_1_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_1_2", "XML_SCHEMAP_COS_ST_RESTRICTS_1_3_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_1_3_2", "XML_SCHEMAP_COS_ST_RESTRICTS_2_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_1_1", "XML_SCHEMAP_COS_ST_RESTRICTS_2_3_1_2",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_1", "XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_2",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_3", "XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_4",
"XML_SCHEMAP_COS_ST_RESTRICTS_2_3_2_5", "XML_SCHEMAP_COS_ST_RESTRICTS_3_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_1", "XML_SCHEMAP_COS_ST_RESTRICTS_3_3_1_2",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_2", "XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_1",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_3", "XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_4",
"XML_SCHEMAP_COS_ST_RESTRICTS_3_3_2_5", "XML_SCHEMAP_COS_ST_DERIVED_OK_2_1",
"XML_SCHEMAP_COS_ST_DERIVED_OK_2_2", "XML_SCHEMAP_S4S_ELEM_NOT_ALLOWED",
"XML_SCHEMAP_S4S_ELEM_MISSING", "XML_SCHEMAP_S4S_ATTR_NOT_ALLOWED",
"XML_SCHEMAP_S4S_ATTR_MISSING", "XML_SCHEMAP_S4S_ATTR_INVALID_VALUE",
"XML_SCHEMAP_SRC_ELEMENT_1", "XML_SCHEMAP_SRC_ELEMENT_2_1", "XML_SCHEMAP_SRC_ELEMENT_2_2",
"XML_SCHEMAP_SRC_ELEMENT_3", "XML_SCHEMAP_P_PROPS_CORRECT_1",
"XML_SCHEMAP_P_PROPS_CORRECT_2_1", "XML_SCHEMAP_P_PROPS_CORRECT_2_2",
"XML_SCHEMAP_E_PROPS_CORRECT_2", "XML_SCHEMAP_E_PROPS_CORRECT_3",
"XML_SCHEMAP_E_PROPS_CORRECT_4", "XML_SCHEMAP_E_PROPS_CORRECT_5",
"XML_SCHEMAP_E_PROPS_CORRECT_6", "XML_SCHEMAP_SRC_INCLUDE", "XML_SCHEMAP_SRC_ATTRIBUTE_1",
"XML_SCHEMAP_SRC_ATTRIBUTE_2", "XML_SCHEMAP_SRC_ATTRIBUTE_3_1",
"XML_SCHEMAP_SRC_ATTRIBUTE_3_2", "XML_SCHEMAP_SRC_ATTRIBUTE_4",
"XML_SCHEMAP_NO_XMLNS", "XML_SCHEMAP_NO_XSI", "XML_SCHEMAP_COS_VALID_DEFAULT_1",
"XML_SCHEMAP_COS_VALID_DEFAULT_2_1", "XML_SCHEMAP_COS_VALID_DEFAULT_2_2_1",
"XML_SCHEMAP_COS_VALID_DEFAULT_2_2_2", "XML_SCHEMAP_CVC_SIMPLE_TYPE",
"XML_SCHEMAP_COS_CT_EXTENDS_1_1", "XML_SCHEMAP_SRC_IMPORT_1_1",
"XML_SCHEMAP_SRC_IMPORT_1_2", "XML_SCHEMAP_SRC_IMPORT_2", "XML_SCHEMAP_SRC_IMPORT_2_1",
"XML_SCHEMAP_SRC_IMPORT_2_2", "XML_SCHEMAP_INTERNAL", "XML_SCHEMAP_NOT_DETERMINISTIC",
"XML_SCHEMAP_SRC_ATTRIBUTE_GROUP_1", "XML_SCHEMAP_SRC_ATTRIBUTE_GROUP_2",
"XML_SCHEMAP_SRC_ATTRIBUTE_GROUP_3", "XML_SCHEMAP_MG_PROPS_CORRECT_1",
"XML_SCHEMAP_MG_PROPS_CORRECT_2", "XML_SCHEMAP_SRC_CT_1", "XML_SCHEMAP_DERIVATION_OK_RESTRICTION_2_1_3",
"XML_SCHEMAP_AU_PROPS_CORRECT_2", "XML_SCHEMAP_A_PROPS_CORRECT_2",
"XML_SCHEMAP_C_PROPS_CORRECT", "XML_SCHEMAP_SRC_REDEFINE", "XML_SCHEMAP_SRC_IMPORT",
"XML_SCHEMAP_WARN_SKIP_SCHEMA", "XML_SCHEMAP_WARN_UNLOCATED_SCHEMA",
"XML_SCHEMAP_WARN_ATTR_REDECL_PROH", "XML_SCHEMAP_WARN_ATTR_POINTLESS_PROH",
"XML_SCHEMAP_AG_PROPS_CORRECT", "XML_SCHEMAP_COS_CT_EXTENDS_1_2",
"XML_SCHEMAP_AU_PROPS_CORRECT", "XML_SCHEMAP_A_PROPS_CORRECT_3",
"XML_SCHEMAP_COS_ALL_LIMITED", "XML_MODULE_OPEN", "XML_MODULE_CLOSE",
"XML_CHECK_FOUND_ELEMENT", "XML_CHECK_FOUND_ATTRIBUTE", "XML_CHECK_FOUND_TEXT",
"XML_CHECK_FOUND_CDATA", "XML_CHECK_FOUND_ENTITYREF", "XML_CHECK_FOUND_ENTITY",
"XML_CHECK_FOUND_PI", "XML_CHECK_FOUND_COMMENT", "XML_CHECK_FOUND_DOCTYPE",
"XML_CHECK_FOUND_FRAGMENT", "XML_CHECK_FOUND_NOTATION", "XML_CHECK_UNKNOWN_NODE",
"XML_CHECK_ENTITY_TYPE", "XML_CHECK_NO_PARENT", "XML_CHECK_NO_DOC",
"XML_CHECK_NO_NAME", "XML_CHECK_NO_ELEM", "XML_CHECK_WRONG_DOC",
"XML_CHECK_NO_PREV", "XML_CHECK_WRONG_PREV", "XML_CHECK_NO_NEXT",
"XML_CHECK_WRONG_NEXT", "XML_CHECK_NOT_DTD", "XML_CHECK_NOT_ATTR",
"XML_CHECK_NOT_ATTR_DECL", "XML_CHECK_NOT_ELEM_DECL", "XML_CHECK_NOT_ENTITY_DECL",
"XML_CHECK_NOT_NS_DECL", "XML_CHECK_NO_HREF", "XML_CHECK_WRONG_PARENT",
"XML_CHECK_NS_SCOPE", "XML_CHECK_NS_ANCESTOR", "XML_CHECK_NOT_UTF8",
"XML_CHECK_NO_DICT", "XML_CHECK_NOT_NCNAME", "XML_CHECK_OUTSIDE_DICT",
"XML_CHECK_WRONG_NAME", "XML_CHECK_NAME_NOT_NULL", "XML_I18N_NO_NAME",
"XML_I18N_NO_HANDLER", "XML_I18N_EXCESS_HANDLER", "XML_I18N_CONV_FAILED",
"XML_I18N_NO_OUTPUT"))
XML/R/createNode.R 0000644 0001751 0000144 00000006777 13607633702 013355 0 ustar hornik users xmlNode <-
function(name, ..., attrs = NULL, namespace = "",
namespaceDefinitions = NULL, .children = list(...))
{
kids <- lapply(.children, asXMLNode)
kids = addNames(kids)
# Check the names paces
node <- list(name = name, attributes = attrs, children = kids, namespace=namespace,
namespaceDefinitions = as(namespaceDefinitions, "XMLNamespaceDefinitions"))
class(node) <- oldClass("XMLNode") # , "XMLAbstractNode")
node
}
setAs("NULL", "XMLNamespaceDefinitions", function(from) structure(list(), class = "XMLNamespaceDefinitions"))
addNames =
function(kids, fromTag = TRUE)
{
if(fromTag)
names(kids) = sapply(kids, xmlName)
else if(length(names(kids)) == 0)
names(kids) <- sapply(kids,xmlName)
else if(any( idx <- names(kids) == ""))
names(kids)[idx] <- sapply(kids[idx], xmlName)
kids
}
setGeneric("xmlChildren<-",
function(x, ..., value) {
standardGeneric("xmlChildren<-")
})
setMethod("xmlChildren<-", "ANY",
function(x, append = FALSE, ..., value) {
#value = addNames(value)
#x$children <- value
addChildren(x, append = append, ..., kids = value)
})
# Don't need this.
# xmlName.character =
# function(node, full = FALSE)
# "text"
setMethod("xmlChildren<-", "XMLInternalNode",
function(x, append = TRUE, ..., value) {
if(!append)
removeNodes(xmlChildren(x))
if(!is.list(value))
value = list(value)
addChildren(x, kids = value)
})
addChildren =
function(node, ..., kids = list(...), at = NA, cdata = FALSE, append = TRUE)
UseMethod("addChildren")
addChildren.XMLNode =
function(node, ..., kids = list(...), at = NA, cdata = FALSE, append = TRUE)
{
kids = lapply(kids,
function(i) {
if(!inherits(i, "XMLNode"))
xmlTextNode(as.character(i), cdata = cdata)
else
i
})
node$children = if(append) c(node$children, kids) else kids
node$children = addNames(node$children)
node
}
# It would be better tokenize this, but ...
XMLEntities =
c("&" = "amp", # order is important as if we insert an entity, then the next time we will see the &.
">" = "gt",
"<" = "lt",
"'" = "apos",
'"' = "quot")
insertEntities =
function(value, entities = XMLEntities)
{
pat = names(entities)
subs = paste("&", entities, ";", sep = "")
for(i in seq(along = entities))
value = gsub(pat[i], subs[i], value)
value
}
xmlTextNode <-
function(value, namespace = "", entities = XMLEntities, cdata = FALSE)
{
node <- xmlNode("text", namespace = namespace)
if(length(entities) && !inherits(value, "AsIs"))
value = insertEntities(value, XMLEntities)
if(cdata)
value = xmlCDataNode(value)
node$value <- value
if(!cdata)
class(node) <- oldClass("XMLTextNode") # , class(node))
if(length(entities))
class(node) <- c(class(node), "EntitiesEscaped") #"XMLEntitiesEscapedTextNode"
node
}
xmlPINode <-
function(sys, value, namespace="")
{
x <- xmlNode(name=sys, namespace=namespace)
x$value <- value
class(x) <- oldClass("XMLProcessingInstruction") # , class(x))
x
}
xmlCommentNode <-
function(text)
{
node <- xmlTextNode(text)
class(node) <- oldClass("XMLCommentNode") # , class(node))
node
}
xmlCDataNode <-
function(...)
{
txt <- paste(..., collapse="")
node <- xmlNode("text")
node$value <- txt
class(node) <- oldClass("XMLCDataNode") # , class(node))
node
}
asXMLNode <-
function(x)
{
#XXX
if(!inherits(x, "XMLNode")) {
xmlTextNode(x)
} else {
x
}
}
XML/R/xmlEventHandler.R 0000644 0001751 0000144 00000001775 13607633674 014405 0 ustar hornik users xmlEventHandler <-
function() {
con <- xmlOutputDOM()
startElement <- function(name, atts,...) {
con$addTag(name, attrs=atts, close=FALSE)
}
endElement <- function(name) {
con$closeTag(name)
}
text <- function(x,...) {
con$addNode(xmlTextNode(x))
}
comment <- function(x,...) {
xmlCommentNode(x)
}
externalEntity <- function(ctxt, baseURI, sysId, publicId,...) {
cat("externalEntity", ctxt, baseURI, sysId, publicId,"\n")
}
entityDeclaration <- function(name, baseURI, sysId, publicId, notation, ...) {
# just writing to screen at this point.
cat("externalEntity", name, baseURI, sysId, publicId, notation,"\n")
}
processingInstruction <- function(sys, value) {
con$addNode(xmlPINode(sys, value))
}
list(startElement=startElement, endElement=endElement,
processingInstruction=processingInstruction,
text=text,
comment=comment,
externalEntity=externalEntity,
entityDeclaration=entityDeclaration,
dom=function(){con})
}
XML/R/DTDRef.R 0000644 0001751 0000144 00000004636 13607633665 012354 0 ustar hornik users # These are classes and facilities for referring to a DTD for the
# DOCTYPE field of an XML document
# The 4 elements are + or -//creator//name of what is being referenced//language (decribed by ISO639)
# See XML Elements of Style by Simon St. Laurent.
validatePublicIdentifier =
function(object)
{
els = strsplit(object, "//")[[1]]
if(length(els) != 4)
return("a PUBLIC identifier must have 4 parts, separated by //")
if(! (els[1] %in% c("+", "-")))
return("first element of PUBLIC identifier must be + or -")
TRUE
}
setClass("DTDPublicIdentifier",
contains = "character",
validity = validatePublicIdentifier)
# name is the node name for the top-level node.
# system and public identify the DTD.
setClass("Doctype", representation(name = "character",
system = "character",
public = "character"),
validity = function(object) {
if(length(nchar(object@system)) > 0 && length(object@public) > 0)
return("only one of system and public can be specified")
if(length(object@public) > 0 && length(object@public) != 2)
return("the public part of the Doctype must have exactly 2 elements.")
if(length(object@public) > 0) {
tmp = validatePublicIdentifier(object@public[1])
if(!is.logical(tmp))
return(tmp)
}
TRUE
})
Doctype =
function(system = character(), public = character(), name = "")
{
if(length(public) == 1 && length(system) > 0) {
public = c(public, system)
system = character()
}
new("Doctype", name = name, system = system, public = public)
}
ddQuote =
function(x)
{
if(length(x) == 0)
return(character())
paste('"', x, '"', sep = "")
}
setAs("Doctype", "character",
function(from) {
extra = character()
if(sum(nchar(from@public), nchar(from@system))) {
if(length(from@system))
extra = c(extra, "SYSTEM", ddQuote(from@system))
if(length(from@public))
extra = c(extra, "PUBLIC", ddQuote(from@public))
}
paste("")
})
XML/R/xmlNodes.R 0000644 0001751 0000144 00000134216 13610555150 013055 0 ustar hornik users #xmlRoot.HTMLInternalDocument =
xmlRoot.XMLInternalDocument =
function(x, skip = TRUE, addFinalizer = NA, ...)
{
.Call("R_xmlRootNode", x, as.logical(skip), addFinalizer, PACKAGE = "XML")
}
setAs("XMLNode", "XMLInternalNode",
function(from) {
con = textConnection("tmp", "w", local = TRUE)
sink(con)
on.exit({sink(file = NULL); close(con)})
print(from)
doc = xmlParse(tmp, asText = TRUE)
node = xmlRoot(doc)
removeChildren(node)
node
}
)
setAs("XMLInternalDocument", "character", function(from) saveXML(from))
setAs("XMLInternalDOM", "character", function(from) saveXML(from))
setAs("XMLInternalDocument", "XMLInternalNode",
function(from) xmlRoot(from))
setAs("XMLInternalNode", "XMLInternalDocument",
function(from) {
doc = .Call("R_getXMLNodeDocument", from, PACKAGE = "XML")
addDocFinalizer(doc, TRUE)
if(is(doc, "HTMLInternalDocument"))
class(doc) = c(class(doc), "XMLInternalDocument", "XMLAbstractDocument")
doc
})
setGeneric("free", function(obj) standardGeneric("free"))
setMethod("free", "XMLInternalDocument",
function(obj) {
invisible(.Call("R_XMLInternalDocument_free", obj, PACKAGE = "XML"))
})
addFinalizer =
function(obj, fun, ...)
{
UseMethod("addFinalizer")
}
addCFinalizer.XMLInternalDocument =
function(obj, fun, ...)
{
if(missing(fun) || fun == NULL)
fun = getNativeSymbolInfo("RSXML_free_internal_document")$address
else if(!is.function(obj)) {
}
.Call("R_addXMLInternalDocument_finalizer", obj, fun, PACKAGE = "XML")
}
asRXMLNode =
function(node, converters = NULL, trim = TRUE, ignoreBlanks = TRUE)
{
.Call("R_createXMLNode", node, converters, as.logical(trim), as.logical(ignoreBlanks), PACKAGE = "XML")[[1]]
}
"[.XMLInternalDocument" =
function(x, i, j, ..., namespaces = xmlNamespaceDefinitions(x, simplify = TRUE), addFinalizer = NA)
{
if(is.character(i)) {
getNodeSet(x, i, ..., addFinalizer = addFinalizer)
} else
stop("No method for subsetting an XMLInternalDocument with ", class(i))
}
"[[.XMLInternalDocument" =
function(x, i, j, ..., exact = NA, namespaces = xmlNamespaceDefinitions(x, simplify = TRUE),
addFinalizer = NA)
{
ans = x[i, addFinalizer = addFinalizer]
if(length(ans) > 1)
warning(length(ans), " elements in node set. Returning just the first one! (Use [])")
ans[[1]]
}
xmlName.XMLInternalNode =
function(node, full = FALSE)
{
ans = .Call("RS_XML_xmlNodeName", node, PACKAGE = "XML")
if((is.logical(full) && full) || (!is.logical(full) && length(full))) {
tmp = xmlNamespace(node)
if(length(tmp) && length(names(tmp)) > 0 && names(tmp) != "")
ans = paste(names(tmp), ans, sep = ":")
else if(is.character(full) && full != "")
ans = paste(full, ans, sep = ":")
}
ans
}
if(useS4)
setMethod("xmlName", "XMLInternalNode", xmlName.XMLInternalNode)
xmlNamespace.XMLInternalNode =
function(x)
{
.Call("RS_XML_xmlNodeNamespace", x, PACKAGE = "XML")
}
xmlAttrs.XMLInternalNode =
function(node, addNamespacePrefix = FALSE, addNamespaceURLs = TRUE, ...)
{
ans = .Call("RS_XML_xmlNodeAttributes", node, as.logical(addNamespacePrefix), as.logical(addNamespaceURLs), PACKAGE = "XML")
if(length(attr(ans, "namespaces")))
ans = new("XMLAttributes", ans) # class(ans) = "XMLAttributes"
ans
}
#setOldClass(c("XMLAttributes", "character"))
setClass("XMLAttributes", contains = "character")
setMethod("show", "XMLAttributes",
function(object)
print(unclass(object)))
setMethod('[', c('XMLAttributes', "ANY"),
function(x, i, j, ...)
{
ans = callNextMethod()
i = match(i, names(x))
structure(ans, namespaces = attr(x, "namespaces")[i], class = class(x))
})
xmlChildren.XMLInternalNode =
function(x, addNames = TRUE, omitNodeTypes = c("XMLXIncludeStartNode", "XMLXIncludeEndNode"), addFinalizer = NA, ...)
{
kids = .Call("RS_XML_xmlNodeChildrenReferences", x, as.logical(addNames), addFinalizer, PACKAGE = "XML")
if(length(omitNodeTypes))
kids = kids[! sapply(kids, function(x) any(inherits(x, omitNodeTypes)) )]
structure(kids, class = c("XMLInternalNodeList", "XMLNodeList"))
}
xmlChildren.XMLInternalDocument =
function(x, addNames = TRUE, ...)
{
# .Call("RS_XML_xmlDocumentChildren", x, as.logical(addNames), PACKAGE = "XML")
xmlChildren.XMLInternalNode(x, addNames, ...)
}
if(useS4) {
setMethod("xmlAttrs", "XMLInternalNode", xmlAttrs.XMLInternalNode)
setMethod("xmlChildren", "XMLInternalNode", xmlChildren.XMLInternalNode)
setMethod("xmlChildren", "XMLInternalDocument", xmlChildren.XMLInternalDocument)
}
xmlSize.XMLInternalNode =
function(obj)
.Call("RS_XML_xmlNodeNumChildren", obj, PACKAGE = "XML")
"[[.XMLInternalNode" <-
#setMethod("[[", "XMLInternalNode",
function(x, i, j, ..., addFinalizer = NA)
{
if(inherits(i, "formula")) {
return(getNodeSet(x, i, if(missing(j)) character() else j, addFinalizer = addFinalizer, ...)[[1]])
}
if(is.na(i))
return(NULL)
# Get the individual elements rather than all the children and then subset those
return(
if(is(i, "numeric"))
.Call("R_getChildByIndex", x, as.integer(i), as.logical(addFinalizer), PACKAGE = "XML")
else
.Call("R_getChildByName", x, as.character(i), as.logical(addFinalizer), PACKAGE = "XML")
)
kids = xmlChildren(x, addFinalizer = addFinalizer)
if(length(kids) == 0)
return(NULL)
if(is.numeric(i))
kids[[i]]
else {
id = as.character(i)
which = match(id, sapply(kids, xmlName))
kids[[which]]
}
}
"[.XMLInternalNode" <-
function(x, i, j, ..., addFinalizer = NA)
{
kids = xmlChildren(x, addFinalizer = addFinalizer)
if(is.logical(i))
i = which(i)
if(is(i, "numeric"))
structure(kids[i], class = c("XMLInternalNodeList", "XMLNodeList"))
else {
id = as.character(i)
which = match(sapply(kids, xmlName), id)
structure(kids[!is.na(which)], class = c("XMLInternalNodeList", "XMLNodeList"))
}
}
xmlValue.XMLInternalNode =
function(x, ignoreComments = FALSE, recursive = TRUE, encoding = getEncoding(x), trim = FALSE) #CE_NATIVE)
{
encoding = if(is.integer(encoding))
encoding
else
getEncodingREnum(encoding)
if(!recursive) {
if(xmlSize(x) == 0)
return(character())
kids = xmlChildren(x, addFinaliizer = FALSE)
i = sapply(kids, inherits, "XMLInternalTextNode")
if(any(i))
return(paste(unlist(lapply(kids[i], xmlValue, ignoreComments, recursive = TRUE, encoding = encoding, trim = trim)), collapse = ""))
else
return(character())
}
ans = .Call("R_xmlNodeValue", x, NULL, encoding, PACKAGE = "XML") # 2nd argument ignored.
if(trim)
trim(ans)
else
ans
}
setS3Method("xmlValue", "XMLInternalNode")
setGeneric("xmlValue<-", function(x, ..., value) standardGeneric("xmlValue<-"))
setMethod("xmlValue<-", "XMLInternalTextNode",
function(x, ..., value) {
.Call("R_setXMLInternalTextNode_value", x, as.character(value), PACKAGE = "XML")
x
})
setMethod("xmlValue<-", "XMLTextNode",
function(x, ..., value) {
x$value = as.character(value)
x
})
setMethod("xmlValue<-", "XMLAbstractNode",
function(x, ..., value) {
if(xmlSize(x) == 0) {
x = addChildren(x, as.character(value))
} else if(xmlSize(x) == 1 && any(inherits(x[[1]], c("XMLTextNode", "XMLInternalTextNode")))) {
#XXX Fix the assignment to children.
# should be xmlValue(x[[1]]) = value
tmp = x[[1]]
xmlValue(tmp) = as.character(value)
if(inherits(x[[1]], "XMLTextNode"))
x$children[[1]] = tmp
} else
stop("Cannot set the content of a node that is not an XMLInternalTextNode or a node containing a text node")
x
})
names.XMLInternalNode =
function(x)
xmlSApply(x, xmlName, addFinalizer = FALSE)
xmlApply.XMLInternalNode =
function(X, FUN, ..., omitNodeTypes = c("XMLXIncludeStartNode", "XMLXIncludeEndNode"), addFinalizer = NA)
{
kids = xmlChildren(X, addFinalizer = addFinalizer)
if(length(omitNodeTypes))
kids = kids[! sapply(kids, function(x) any(inherits(x, omitNodeTypes)) )]
lapply(kids, FUN, ...)
}
xmlSApply.XMLInternalNode =
function(X, FUN, ..., omitNodeTypes = c("XMLXIncludeStartNode", "XMLXIncludeEndNode"), addFinalizer = NA)
{
kids = xmlChildren(X, addFinalizer = addFinalizer)
if(length(omitNodeTypes))
kids = kids[! sapply(kids, function(x) any(inherits(x, omitNodeTypes)) )]
sapply(kids, FUN, ...)
}
xmlSApply.XMLNodeSet =
function(X, FUN, ..., omitNodeTypes = c("XMLXIncludeStartNode", "XMLXIncludeEndNode"), addFinalizer = NA)
{
sapply(X, FUN, ...)
}
xmlApply.XMLNodeSet =
function(X, FUN, ..., omitNodeTypes = c("XMLXIncludeStartNode", "XMLXIncludeEndNode"), addFinalizer = NA)
{
lapply(X, FUN, ...)
}
getChildrenStrings =
function(node, encoding = getEncoding(node), asVector = TRUE, len = xmlSize(node),
addNames = TRUE)
{
encoding = getEncodingREnum(encoding)
.Call("R_childStringValues", node, as.integer(len), as.logical(asVector), as.integer(encoding),
as.logical(addNames), PACKAGE = "XML")
}
setMethod("xmlParent", "XMLInternalNode",
function(x, addFinalizer = NA, ...)
{
.Call("RS_XML_xmlNodeParent", x, addFinalizer, PACKAGE = "XML")
})
newXMLDTDNode <-
function(nodeName, externalID = character(), systemID = character(), doc = NULL, addFinalizer = NA)
{
if(length(nodeName) > 1 && missing(externalID))
externalID = nodeName[2]
if(length(nodeName) > 2 && missing(systemID))
systemID = nodeName[3]
.Call("R_newXMLDtd", doc, as.character(nodeName), as.character(externalID), as.character(systemID),
addFinalizer, PACKAGE = "XML")
}
setInternalNamespace =
function(node, ns)
{
.Call("R_xmlSetNs", node, ns, FALSE, PACKAGE = "XML") # as.logical(append))
}
addDocFinalizer =
function(doc, finalizer)
{
fun = NULL
if(is.logical(finalizer)) {
if(is.na(finalizer) || !finalizer)
return()
else
fun = NULL
} else {
fun = finalizer
if(inherits(fun, "NativeSymbolInfo"))
fun = fun$address
}
if(!is.null(fun) && !is.function(fun) && typeof(fun) != "externalptr")
stop("need an R function, address of a routine or NULL for finalizer")
.Call("R_addXMLInternalDocument_finalizer", doc, fun, PACKAGE = "XML")
}
HTML_DTDs =
c("http://www.w3.org/TR/html4/frameset.dtd",
"http://www.w3.org/TR/html4/loose.dtd",
"http://www.w3.org/TR/html4/strict.dtd",
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd",
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd",
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd",
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
)
newHTMLDoc =
function(dtd = "loose", addFinalizer = TRUE, name = character(),
node = newXMLNode("html", newXMLNode("head", addFinalizer = FALSE), newXMLNode("body", addFinalizer = FALSE),
addFinalizer = FALSE))
{
if(is.na(dtd) || dtd == "")
dtd = ""
else if(tolower(dtd) %in% c("html5", "5"))
dtd = "5"
else {
i = grep(dtd, HTML_DTDs)
if(length(i)) {
if(length(i) > 1)
warning("matched multiple DTDs. Using the first")
dtd = HTML_DTDs[i[1]]
} else
dtd = ""
}
doc = newXMLDoc(dtd = dtd, isHTML = TRUE, addFinalizer = addFinalizer, node = node)
doc
}
newXMLDoc <-
#
# Creates internal C-level libxml object for representing
# an XML document/tree of nodes.
#
function(dtd = "", namespaces = NULL, addFinalizer = TRUE, name = character(), node = NULL,
isHTML = FALSE)
{
if(is(dtd, "XMLInternalNode")) {
dtdNode = dtd
dtd = character()
} else
dtdNode = NULL
ans = .Call("R_newXMLDoc", dtd, namespaces, as.logical(isHTML), PACKAGE = "XML")
class(ans) = oldClass(class(ans))
addDocFinalizer(ans, addFinalizer)
if(length(name))
docName(ans) = as.character(name)
if(length(dtdNode))
addChildren(ans, dtdNode)
if(length(node)) {
if(is.character(node))
## was parent = doc
newXMLTextNode(node, addFinalizer = FALSE, parent = ans)
else
addChildren(ans, node)
}
ans
}
XMLOptions = new.env()
getOption =
function(name, default = NULL, converter = NULL)
{
if(!exists(name, XMLOptions, inherits = FALSE))
return(base::getOption(name, default))
ans = get(name, XMLOptions)
if(is.function(converter))
converter(ans)
else
ans
}
setOption =
function(name, value)
{
prev = getOption(name)
assign(name, value, XMLOptions)
prev
}
newXMLNode <-
###XXX Note that there is another definition of this in dups.R
# Which is now elided.
# Create an internal C-level libxml node
#
#
# It is possible to use a namespace prefix that is not defined.
# This is okay as it may be defined in another node which will become
# an ancestor of this newly created one.
# XXX Have to add something to force the namespace prefix into the node
# when there is no corresponding definition for that prefix.
function(name, ..., attrs = NULL,
namespace = character(), namespaceDefinitions = character(),
doc = NULL, .children = list(...), parent = NULL,
at = NA,
cdata = FALSE,
suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE), # i.e. warn.
sibling = NULL, addFinalizer = NA,
noNamespace = length(namespace) == 0 && !missing(namespace),
fixNamespaces = c(dummy = TRUE, default = TRUE)
)
{
# determine whether we know now that there is definitely no namespace.
# make certain we have a character vector for the attributes.
if(length(attrs)) {
ids = names(attrs)
attrs = structure(as(attrs, "character"), names = ids)
# Find any attributes that are actually namespace definitions.
i = grep("^xmlns", names(attrs))
if(length(i)) {
warning("Don't specify namespace definitions via 'attrs'; use namespaceDefinitions")
namespace = c(namespace, structure(attrs[i], names = gsub("^xmlns:", "", names(attrs)[i])))
attrs = attrs[ -i]
}
} else
attrs = character()
# allow the caller to specify the node name as ns_prefix:name
# but we have to create it as name and the set the namespace.
ns = character() # the namespace prefix
name = strsplit(name, ":")[[1]]
if(length(name) == 2) {
ns = name[1]
name = name[2]
noNamespace = FALSE
}
if(is.list(parent)) {
if(length(parent) < 1 ||
!(is(parent[[1]], "XMLInternalElementNode") || is(parent[[1]], "XMLInternalDocument")))
stop("incorrect value for parent")
parent = parent[[1]]
}
# if there is no doc, but we have a parent which is an XMLInternalDocument, use that.
if(missing(doc) && !missing(parent) &&
inherits(parent, "XMLInternalDocument")) {
doc = parent
parent = NULL
}
# Get the doc from the parent node/document.
if(is.null(doc) && !is.null(parent)) {
# doc = as(parent, "XMLInternalDocument")
doc = if(inherits(parent, "XMLInternalDocument"))
parent
else
.Call("R_getXMLNodeDocument", parent, PACKAGE = "XML")
}
# create the node. Let's leave the namespace definitions and prefix till later.
# xmlSetProp() routine in R_newXMLNode() handles namespaces on the attribute names, even checking them.
node <- .Call("R_newXMLNode", as.character(name), character(), character(), doc, namespaceDefinitions,
addFinalizer, PACKAGE = "XML")
if(!is.null(sibling))
addSibling(sibling, node, after = as.logical(at))
else if(!is.null(parent))
addChildren(parent, node, at = at)
if(TRUE) { # Create the name space definitions here rather than in C code.
nsDefs = lapply(seq(along = namespaceDefinitions),
function(i)
newNamespace(node, namespaceDefinitions[[i]], names(namespaceDefinitions)[i], set = FALSE)
)
if(length(namespaceDefinitions))
names(nsDefs) = if(length(names(namespaceDefinitions))) names(namespaceDefinitions) else ""
} else
nsDefs = xmlNamespaceDefinitions(node)
# Now that the namespaces are defined, we can define the attributes which _may_ use them.
addAttributes(node, .attrs = attrs, suppressNamespaceWarning = suppressNamespaceWarning)
if(is(namespace, "XMLNamespaceRef")) {
setInternalNamespace(node, namespace)
} else if(is.na(noNamespace) || !noNamespace) {
ns = getNodeNamespace(ns, nsDefs, node, namespace, noNamespace, namespaceDefinitions, parent, suppressNamespaceWarning)
if(is.null(ns))
!.Call("R_setNamespaceFromAncestors", node, PACKAGE = "XML")
# .Call("R_getAncestorDefaultNSDef", node, TRUE, PACKAGE = "XML")
}
# Here is where we set the namespace for this node.
if(length(ns) && (inherits(ns, c("XMLNamespaceRef", "XMLNamespaceDeclaration")) || (is.character(ns) && ns != "")))
setXMLNamespace( node, ns) # should this be append = FALSE ?
# Add any children to this node.
if(length(.children)) {
if(!is.list(.children))
.children = list(.children)
addChildren(node, kids = .children, cdata = cdata, addFinalizer = addFinalizer)
}
if(any(fixNamespaces)) { # !is.null(parent)) {
xmlFixNamespaces(node, fixNamespaces)
# fixDummyNS(node, suppressNamespaceWarning)
}
node
}
xmlFixNamespaces =
function(node, fix)
{
if(length(fix) == 1)
fix = structure(rep(fix, 2), names = c("dummy", "default"))
if(length(names(fix)) == 0)
names(fix) = c("dummy", "default")
if(fix["dummy"])
xmlApply(node, function(x) .Call("R_fixDummyNS", x, TRUE, PACKAGE = "XML"))
if(fix["default"])
.Call("R_getAncestorDefaultNSDef", node, TRUE, PACKAGE = "XML")
}
FixDummyNS = 2L
FixDefaultNS = 4L
xmlNamespaceRef =
function(node)
.Call("R_getXMLNsRef", node, PACKAGE = "XML")
if(FALSE) {
# Quick check to see if the speed problem in newXMLNode above is in the extra processing
newXMLNode <-
function(name, ..., attrs = NULL,
namespace = "", namespaceDefinitions = character(),
doc = NULL, .children = list(...), parent = NULL,
at = NA,
cdata = FALSE,
suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE) # i.e. warn.
)
{
node = .Call("R_newXMLNode", name, as.character(attrs), character(), doc, character(), TRUE, PACKAGE = "XML")
if(!is.null(parent))
addChildren(parent, node, at = at)
node
}
}
findNamespaceDefinition =
#
# Search up the node hierarchy looking for a namespace
# matching that prefix.
#
function(node, namespace, error = TRUE)
{
ptr = node
while(!is.null(ptr)) {
tmp = namespaceDeclarations(ptr, TRUE)
i = match(namespace, names(tmp))
if(!is.na(i))
return(tmp[[i]])
ptr = xmlParent(ptr)
}
if(error)
stop("no matching namespace definition for prefix ", namespace)
NULL
}
setXMLNamespace =
#
# Set the specified namespace as the namespace for this
# node.
# namespace can be a prefix in which case we find it in the
# definition in this node or its ancestors.
# Otherwise, we expect a name = value character vector giving the
# prefix and URI and we create a new namespace definition.
# Alternatively, if you already have the namespace reference object
# from earlier, you can pass that in.
# Then we set the namespace on the node.
function(node, namespace, append = FALSE)
{
if(is.character(namespace) && is.null(names(namespace)))
namespace = findNamespaceDefinition(node, namespace)
else if(is.character(namespace))
namespace = newNamespace(node, namespace)
else if(!is.null(namespace) && !inherits(namespace, c("XMLNamespaceRef", "XMLNamespaceDeclaration")))
stop("Must provide a namespace definition, a prefix of existing namespace or a reference to a namespace definition")
.Call("R_xmlSetNs", node, namespace, FALSE, PACKAGE = "XML")
}
setAs("XMLNamespace", "character",
function(from)
unclass(from))
setAs("XMLNamespaceDefinition", "character",
function(from)
structure(from$uri, names = from$id))
setGeneric("xmlNamespace<-",
function(x, ..., value)
standardGeneric("xmlNamespace<-"))
setMethod("xmlNamespace<-", "XMLInternalNode",
function(x, ..., value) {
setXMLNamespace(x, value)
x
})
setGeneric("xmlNamespaces<-",
function(x, append = TRUE, set = FALSE, value)
standardGeneric("xmlNamespaces<-"))
setMethod("xmlNamespaces<-", "XMLNode",
function(x, append = TRUE, set = FALSE, value) {
if(inherits(value, "XMLNamespace"))
value = as(value, "character")
else if(is.null(names(value)))
names(value) = ""
# check for duplicates?
i = duplicated(names(value))
if(any(i)) {
warning("discarding duplicated namespace prefixes ", paste(names(value)[i], collapse = ", "))
value = value[!i]
}
if(append) {
cur = as(x$namespaceDefinitions, "character")
cur[names(value)] = value
value = cur
}
x$namespaceDefinitions = as(value, "XMLNamespaceDefinitions")
if(set)
x$namespace = names(value)
x
})
setMethod("xmlNamespaces<-", "XMLInternalNode",
function(x, append = TRUE, set = FALSE, value) {
value = as(value, "character")
if(is.null(names(value)))
names(value) = ""
# check for duplicates?
i = duplicated(names(value))
if(any(i)) {
warning("discarding duplicated namespace prefixes ", paste(names(value)[i], collapse = ", "))
value = value[!i]
}
if(append) {
# Work with existing ones
curDefs = namespaceDeclarations(x)
i = names(value) %in% names(curDefs)
if(any(i)) {
warning("discarding duplicated namespace prefixes ", paste(names(value)[i], collapse = ", "))
value = value[!i]
}
}
if(length(value) == 0)
# Should worry about the set.
return()
if(length(set) == 1 && set == TRUE && length(value) > 1)
set = c(set, rep(FALSE, length(value) - 1))
else
set = rep(set, length.out = length(value))
for(i in seq(along = value))
newXMLNamespace(x, value[i], set = set[i])
x
})
newXMLNamespace = newNamespace =
# Create a new namespace reference object.
function(node, namespace, prefix = names(namespace), set = FALSE)
{
if(is.null(namespace))
return(NULL) # XXX
ns <- .Call("R_xmlNewNs", node, namespace, as.character(prefix), PACKAGE = "XML")
if(set)
setXMLNamespace(node, ns)
ns
}
checkNodeNamespace =
#
# can only be checked after we know the parent node,
# i.e. after it has been inserted.
#
function(node, prefix = xmlNamespace(node))
{
if(length(prefix) == 0 || prefix == "")
return(TRUE)
# XXX should check that namespace is defined
# walk the parents.
okay = FALSE
p = xmlParent(node)
while(!is.null(p)) {
okay = prefix %in% names(xmlNamespaceDefinitions(p))
if(okay)
break
}
if(!okay)
stop("using an XML namespace prefix '", prefix, "' for a node that is not defined for this node or its node's ancestors")
TRUE
}
# Still to do:
# element, entity, entity_ref, notation
# And more in libxml/tree.h, e.g. the declaration nodes
#
newXMLTextNode =
#
# cdata allows the caller to specify that the text be
# wrapped in a newXMLCDataNode
function(text, parent = NULL, doc = NULL, cdata = FALSE, escapeEntities = is(text, "AsIs"),
addFinalizer = NA)
{
if(cdata)
return(newXMLCDataNode(text, parent, doc, addFinalizer = addFinalizer))
a = .Call("R_newXMLTextNode", as.character(text), doc, addFinalizer, PACKAGE = "XML")
if(escapeEntities)
setNoEnc(a)
if(!is.null(parent))
addChildren(parent, a)
a
}
newXMLPINode <-
function(name, text, parent = NULL, doc = NULL, at = NA, addFinalizer = NA)
{
a = .Call("R_newXMLPINode", doc, as.character(name), as.character(text), addFinalizer, PACKAGE = "XML")
if(!is.null(parent))
addChildren(parent, a, at = at)
a
}
newXMLCDataNode <-
function(text, parent = NULL, doc = NULL, at = NA, sep = "\n", addFinalizer = NA)
{
text = paste(as.character(text), collapse = "\n")
a = .Call("R_newXMLCDataNode", doc, text, addFinalizer, PACKAGE = "XML")
if(!is.null(parent))
addChildren(parent, a, at = at)
a
}
newXMLCommentNode <-
function(text, parent = NULL, doc = NULL, at = NA, addFinalizer = NA)
{
a = .Call("R_xmlNewComment", as.character(text), doc, addFinalizer, PACKAGE = "XML")
if(!is.null(parent))
addChildren(parent, a, at = at)
a
}
replaceNodes =
function(oldNode, newNode, ...)
{
UseMethod("replaceNodes")
}
replaceNodes.list =
function(oldNode, newNode, addFinalizer = NA, ...)
{
mapply(replaceNodes, oldNode, newNode, MoreArgs = list(addFinalizer = addFinalizer, ...))
}
replaceNodes.XMLInternalNode =
function(oldNode, newNode, addFinalizer = NA, ...)
{
oldNode = as(oldNode, "XMLInternalNode")
#XXX deal with a list of nodes.
newNode = as(newNode, "XMLInternalNode")
.Call("RS_XML_replaceXMLNode", oldNode, newNode, addFinalizer, PACKAGE = "XML")
}
#
if(FALSE) # This is vectorized for no reason
"[[<-.XMLInternalNode" =
function(x, i, j, ..., value)
{
if(!is.list(value))
value = list(value)
if(is.character(i)) {
if(length(names(x)) == 0)
k = rep(NA, length(i))
else
k = match(i, names(x))
if(any(is.na(k))) {
# create a node with that name and text
value[is.na(k)] = mapply(function(name, val)
if(is.character(val))
newXMLNode(name, val)
else
val)
}
i = k
}
replace = (i <= xmlSize(x))
if(any(replace)) {
replaceNodes(xmlChildren(x)[i[replace]], value[replace])
value = value[!replace]
i = i[!replace]
}
if(length(i))
addChildren(x, kids = value, at = i)
x
}
"[[<-.XMLInternalNode" =
function(x, i, j, ..., value)
{
if(is.character(i)) {
if(length(names(x)) == 0)
k = NA
else
k = match(i, names(x))
if(is.na(k) && is.character(value) && !inherits(value, "AsIs")) {
# create a node with that name and text
value = newXMLNode(i, value)
}
i = k
}
replace = !is.na(i) & (i <= xmlSize(x))
if(replace)
replaceNodes(xmlChildren(x)[[i]], value)
else
addChildren(x, kids = list(value), at = i)
x
}
setNoEnc =
function(node)
{
if(!is(node, "XMLInternalTextNode"))
stop("setNoEnc can only be applied to an native/internal text node, not ", paste(class(node), collapse = ", "))
.Call("R_setXMLInternalTextNode_noenc", node, PACKAGE = "XML")
}
addChildren.XMLInternalNode =
addChildren.XMLInternalDocument =
#
# XXX need to expand/recycle the at if it is given as a scalar
# taking into account if the subsequent elements are lists, etc.
#
# Basically, if the caller specifies at as a scalar
# we expand this to be the sequence starting at that value
# and having length which is the total number of nodes
# in kids. This is not just the length of kids but
# the number of nodes since some of the elements might be lists.
#
function(node, ..., kids = list(...), at = NA, cdata = FALSE, addFinalizer = NA,
fixNamespaces = c(dummy = TRUE, default = TRUE))
{
kids = unlist(kids, recursive = FALSE)
removeNodes(kids[!vapply(kids, is.character, logical(1L))])
if(length(kids) == 1 && inherits(kids[[1]], "XMLInternalNode") && is.na(at)) {
.Call("R_insertXMLNode", kids[[1]], node, -1L, FALSE, PACKAGE = "XML")
# return(node)
} else {
# if(all(is.na(at))) {
# kids = lapply(kids, as, function(x) if(is.character(x)) newXMLTextNode(x) else as(x, "XMLInternalNode"))
# .Call("R_insertXMLNodeDirectly", node, kids, PACKAGE = "XML")
# return(node)
# }
if(!is.na(at)) {
# if at is the name of a child node, find its index (first node with that name)
if(is.character(at))
at = match(at, names(node))
if(length(at) == 1)
at = seq(as.integer(at), length = sum(sapply(kids, function(x) if(is.list(x)) length(x) else 1)))
else # pad with NAs
length(at) = length(kids)
return(lapply(seq(along = kids),
function(j) {
i = kids[[j]]
if(is.character(i))
i = newXMLTextNode(i, cdata = cdata, addFinalizer = addFinalizer)
if(!inherits(i, "XMLInternalNode")) #XX is(i, "XMLInternalNode")
i = as(i, "XMLInternalNode")
if(.Call("R_isNodeChildOfAt", i, node, as.integer(at[j]), PACKAGE = "XML"))
return(i)
if(is.na(at[j]))
.Call("R_insertXMLNode", i, node, -1L, FALSE, PACKAGE = "XML")
else {
after = at[j] > 0
if(!after)
at[j] = 1
if(xmlSize(node) < at[j])
.Call("R_insertXMLNode", i, node, as.integer(NA), FALSE, PACKAGE = "XML")
else
.Call("RS_XML_xmlAddSiblingAt", node[[ at[j] ]], i, after, addFinalizer, PACKAGE = "XML") # if at = 0, then shove it in before the sibling.
}
}))
}
for(j in seq(along = kids)) {
i = kids[[j]]
if(is.list(i)) { # can't happen now since we unlist()
for(k in i)
addChildren(node, k, addFinalizer = addFinalizer)
} else {
if(is.null(i))
next
if(is.character(i))
i = newXMLTextNode(i, cdata = cdata, addFinalizer = FALSE)
if(!inherits(i, "XMLInternalNode")) {
i = as(i, "XMLInternalNode")
}
.Call("R_insertXMLNode", i, node, at[j], FALSE, PACKAGE = "XML")
ns = attr(i, "xml:namespace")
if(!is.null(ns)) {
nsdef = findNamespaceDefinition(node, ns)
if(!is.null(nsdef) && (inherits(nsdef, c("XMLNamespaceRef", "XMLNamespaceDeclaration")) || (is.character(nsdef) && nsdef != ""))) {
setXMLNamespace( i, nsdef)
attr(i, "xml:namespace") = NULL
}
}
}
}
}
if(!is(node, "XMLInternalDocument") && any(fixNamespaces))
xmlFixNamespaces(node, fixNamespaces)
node
}
addSibling =
function(node, ..., kids = list(...), after = NA)
{
UseMethod("addSibling")
}
addSibling.XMLInternalNode =
function(node, ..., kids = list(...), after = TRUE, addFinalizer = NA)
{
#XXX Why add as children?
if(FALSE && is.na(after))
addChildren(node, kids = kids, at = NA)
else {
lapply(kids,
function(x) {
.Call("RS_XML_xmlAddSiblingAt", node, x, as.logical(after), addFinalizer, PACKAGE = "XML")
})
}
}
removeNodes =
function(node, free = rep(FALSE, length(node)))
UseMethod("removeNodes")
removeNodes.default =
function(node, free = rep(FALSE, length(node)))
NULL
removeNodes.list = removeNodes.XMLNodeList =
function(node, free = rep(FALSE, length(node)))
{
if(!all(sapply(node, inherits, "XMLInternalNode"))) {
warning("removeNode only works on internal nodes at present")
return(NULL)
}
free = as.logical(free)
free = rep(free, length = length(node))
.Call("R_removeInternalNode", node, free, PACKAGE = "XML")
}
removeNodes.XMLNodeSet =
function(node, free = rep(FALSE, length(node)))
{
removeNodes.list(node, free)
}
removeNodes.XMLInternalNode =
function(node, free = rep(FALSE, length(node)))
{
node = list(node)
free = as.logical(free)
.Call("R_removeInternalNode", node, free, PACKAGE = "XML")
}
removeChildren =
function(node, ..., kids = list(...), free = FALSE)
{
UseMethod("removeChildren")
}
removeChildren.XMLNode =
#
#
function(node, ..., kids = list(...), free = FALSE)
{
kidNames = names(node)
w = sapply(kids,
function(i) {
orig = i
if(length(i) > 1)
warning("each node identifier should be a single value, i.e. a number or a name, not a vector. Ignoring ",
paste(i[-1], collapse = ", "))
if(!inherits(i, "numeric"))
i = match(i, kidNames)
if(is.na(i)) {
warning("can't find node identified by ", orig)
i = 0
}
i
})
node$children = unclass(node)$children[ - w ]
node
}
removeChildren.XMLInternalNode =
function(node, ..., kids = list(...), free = FALSE)
{
# idea is to get the actual XMLInternalNode objects
# corresponding the identifiers in the kids list.
# These are numbers, node names or node objects themselves
# This could be fooled by duplicates, e.g. kids = list(2, 2)
# or kids = list(2, "d") where "d" identifies the second node.
# We can put in stricter checks in the C code if needed.
nodes = xmlChildren(node)
nodeNames = xmlSApply(node, xmlName)
v = lapply(kids,
function(x) {
if(inherits(x, "XMLInternalNode"))
x
else if(is.character(x)) {
i = match(x, nodeNames)
nodes[[i]]
} else
nodes[[as.integer(x)]]
})
free = rep(free, length = length(v))
.Call("RS_XML_removeChildren", node, v, as.logical(free), PACKAGE = "XML")
node
}
replaceNodeWithChildren =
function(node)
{
if(!inherits(node, "XMLInternalNode"))
stop("replaceNodeWithChildren only work on internal XML/HTML nodes")
.Call("R_replaceNodeWithChildren", node, PACKAGE = "XML")
}
setGeneric("toHTML",
function(x, context = NULL) standardGeneric("toHTML"))
setMethod('toHTML', 'vector',
function(x, context = NULL) {
tb = newXMLNode("table")
if(length(names(x)) > 0)
addChildren(tb, newXMLNode("tr", .children = sapply(names(x), function(x) newXMLNode("th", x))))
addChildren(tb, newXMLNode("tr", .children = sapply(x, function(x) newXMLNode("th", format(x)))))
tb
})
setMethod('toHTML', 'matrix',
function(x, context = NULL) {
tb = newXMLNode("table")
if(length(colnames(x)) > 0)
addChildren(tb, newXMLNode("tr", .children = sapply(names(x), function(x) newXMLNode("th", x))))
rows = sapply(seq(length = nrow(x)),
function(i) {
row = newXMLNode("tr")
if(length(rownames(x)) > 0)
addChildren(row, newXMLNode("th", rownames(x)[i]))
addChildren(row, .children = sapply(x[i,], function(x) newXMLNode("th", format(x))))
row
})
addChildren(tb, rows)
tb
})
SpecialCallOperators =
c("+", "-", "*", "/", "%*%", "%in%", ":")
#XXX Not necessarily working yet! See RXMLDoc
setMethod('toHTML', 'call',
function(x, context) {
# handle special operators like +, -, :, ...
if(as.character(v[[1]]) %in% SpecialCallOperators) {
}
v = newXMLNode(x[[1]], "(")
for(i in v[-1])
addChildren(v, toHTML( i , context))
v
})
setAs("vector", "XMLInternalNode",
function(from) {
newXMLTextNode(as(from, "character"))
})
print.XMLInternalDocument =
function(x, ...)
{
cat(as(x, "character"), "\n")
}
print.XMLInternalNode =
function(x, ...)
{
cat(as(x, "character"), "\n")
}
setAs("XMLInternalNode", "character",
function(from) saveXML.XMLInternalNode(from))
setAs("XMLInternalTextNode", "character",
function(from) xmlValue(from))
checkAttrNamespaces =
function(nsDefs, .attrs, suppressNamespaceWarning)
{
ns = sapply(strsplit(names(.attrs), ":"),
function(x) if(length(x) > 1) x[1] else NA)
i = which(!is.na(ns))
m = match(ns[i], names(nsDefs))
if(any(is.na(m))) {
f = if(is.character(suppressNamespaceWarning))
get(suppressNamespaceWarning, mode = "function")
else
warning
f(paste("missing namespace definitions for prefix(es)", paste(ns[i][is.na(m)])))
}
}
setGeneric("addAttributes",
function(node, ..., .attrs = NULL, suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE), append = TRUE)
standardGeneric("addAttributes"))
setMethod("addAttributes", "XMLInternalElementNode",
function(node, ..., .attrs = NULL, suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE), append = TRUE)
{
if(missing(.attrs))
.attrs = list(...)
.attrs = structure(as.character(.attrs), names = names(.attrs))
if(length(.attrs) == 0)
return(node)
if(is.null(names(.attrs)) || any(names(.attrs) == ""))
stop("all node attributes must have a name")
if(is.character(suppressNamespaceWarning) || !suppressNamespaceWarning)
checkAttrNamespaces(getEffectiveNamespaces(node), .attrs, suppressNamespaceWarning)
if(!append)
removeAttributes(node, .all = TRUE)
.Call("RS_XML_addNodeAttributes", node, .attrs, PACKAGE = "XML")
node
})
#if(!isGeneric("xmlAttrs<-"))
setGeneric("xmlAttrs<-", function(node, append = TRUE, suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE), value)
standardGeneric("xmlAttrs<-"))
tmp =
function(node, append = TRUE, suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE), value)
{
addAttributes(node, .attrs = value, suppressNamespaceWarning = suppressNamespaceWarning, append = append)
}
setMethod("xmlAttrs<-", "XMLInternalElementNode", tmp)
setMethod("xmlAttrs<-", "XMLNode", tmp)
setMethod("addAttributes", "XMLNode",
function(node, ..., .attrs = NULL, suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE), append = TRUE) {
if(missing(.attrs))
.attrs = list(...)
.attrs = structure(as.character(.attrs), names = names(.attrs))
if(is.null(names(.attrs)) || any(names(.attrs) == ""))
stop("all node attributes must have a name")
if(is.character(suppressNamespaceWarning) || !suppressNamespaceWarning)
checkAttrNamespaces(getEffectiveNamespaces(node), .attrs, suppressNamespaceWarning)
if(append) {
i = match(names(.attrs), names(node$attributes))
if(any(!is.na(i))) {
node$attributes[i[!is.na(i)]] = .attrs[!is.na(i)]
.attrs = .attrs[is.na(i)]
}
node$attributes = c(node$attributes, .attrs)
} else
node$attributes = .attrs
node
})
setGeneric("removeAttributes", function(node, ..., .attrs = NULL, .namespace = FALSE,
.all = (length(list(...)) + length(.attrs)) == 0)
standardGeneric("removeAttributes"))
setGeneric("removeXMLNamespaces",
function(node, ..., all = FALSE, .els = unlist(list(...)))
standardGeneric("removeXMLNamespaces"))
setMethod("removeXMLNamespaces", "XMLInternalElementNode",
function(node, ..., all = FALSE, .els = unlist(list(...))) {
if(all)
.Call("RS_XML_removeAllNodeNamespaces", node, PACKAGE = "XML")
else {
if(is.character(.els))
.els = lapply(.els, function(x) x)
.Call("RS_XML_removeNodeNamespaces", node, .els, PACKAGE = "XML")
}
})
setMethod("removeAttributes", "XMLInternalElementNode",
#
# The idea here is to remove attributes by name
# We handle the case where these are a simple collection
# of character string identifiers given via the ... or as a character
# vector using, e.g., .attrs = c("a", "b")
#
# Each identifier can be of the form "name" or "ns:name" giving
# the namespace prefix. We resolve the namespace and
#
# If we are dealing with regular attributes (no namespace attributes)
# then we expect these as a character vector.
#
# The intent of the .namespace argument was originally to indicate that
# we wanted to remove the namespace definition. It appears that libxml2 does
# not support that. (And it would seem that this is a real pain as the xmlNsPtr
# objects can be shared across numerous places in a linked list, so it would
# be very difficult to remove it from one node.)
#
#
#
function(node, ..., .attrs = NULL, .namespace = FALSE,
.all = (length(list(...)) + length(.attrs)) == 0)
{
if(missing(.attrs))
.attrs = list(...)
.attrs = as.character(.attrs)
if(.all) {
if(length(list(...)) || length(.attrs))
stop(".all specified as TRUE and individual values specified via .../.attrs")
# Use the integer indices to identify the elements.
.Call("RS_XML_removeNodeAttributes", node, seq(along = xmlAttrs(node)), FALSE, PACKAGE = "XML")
return(node)
}
if(is(.namespace, "XMLNamespaceDeclaration"))
.namespace = list(.namespace)
#XXX
tmp = strsplit(.attrs, ":")
prefix = sapply(tmp, function(x) if(length(x) > 1) x[1] else "")
ids = sapply(tmp, function(x) if(length(x) == 1) x[1] else x[2])
if(any(prefix != "") && is.logical(.namespace))
.namespace = TRUE
if(is.logical(.namespace) && .namespace) {
ns = namespaceDeclarations(node, TRUE)
# need to create a list with the elements corresponding to the
# (potentially repeated) ns elements
i = match(prefix, names(ns))
ns = ns[i]
names(ns) = gsub("^.*:", "", .attrs) # or ids from above
.attrs = ns
}
.Call("RS_XML_removeNodeAttributes", node, .attrs, .namespace, PACKAGE = "XML")
node
})
setMethod("removeAttributes", "XMLNode",
function(node, ..., .attrs = NULL, .namespace = FALSE,
.all = (length(list(...)) + length(.attrs)) == 0)
{
a = node$attributes
if(missing(.attrs))
.attrs = list(...)
.attrs = as.character(.attrs)
if(.all) {
if(length(.attrs))
stop("Both individual attribute names and .all specified")
node$attributes = character()
return(node)
}
i = match(.attrs, names(a))
if(any(is.na(i)) )
warning("Can't locate attributes ", paste(.attrs[is.na(i)], collapse = ", "), "in XML node ", node$name)
a = a[is.na(i)]
node$attributes <- a
node
})
#xmlNamespaceDefinitions = # ??? added this but overrides other S3 generic.
namespaceDeclarations =
function(node, ref = FALSE, ...)
{
.Call("RS_XML_getNsList", node, as.logical(ref), PACKAGE = "XML")
}
"xmlName<-" =
function(x, value)
{
UseMethod("xmlName<-")
}
"xmlName<-.XMLNode" <-
function(x, value)
{
x$name <- value
x
}
"xmlName<-.XMLInternalElementNode" <-
function(x, value)
{
# we could handle a new namespace by accepting value as
# a character vector with a name
# e.g. c(r:array = 'http://www.r-project.org')
# Alternatively, just define the namespace on the node _before_
# changing the name.
id = names(value)
if(!is.null(id) && length( (tmp <- strsplit(id, ":")[[1]])) > 1) {
names(value) = tmp[1]
newXMLNamespaces(x, .values = as(value, "character"))
value = id
}
.Call("RS_XML_setNodeName", x, value, PACKAGE = "XML")
x
}
newXMLNamespaces =
# allow for multiple namespaces
# and also allow for "r:value"
#
# newXMLNamespaces(node, r = "http://www.r-project.org", ...)
#
function(node, ..., .values = list(...))
{
ids = names(.values)
ans = lapply(ids, function(id)
newNamespace(node, id, as.character(.values[[id]])))
names(ans) = ids
ans
}
xmlNodeMatch =
function(x, table, nomatch = NA_integer_)
{
.Call("R_matchNodesInList", x, table, as.integer(nomatch), PACKAGE = "XML")
}
setGeneric("xmlClone",
function(node, recursive = TRUE, addFinalizer = FALSE, ...)
{
oclass = class(node)
ans = standardGeneric("xmlClone")
if(!isS4(node))
class(ans) = oclass
ans
})
setMethod("xmlClone", "XMLInternalDocument",
function(node, recursive = TRUE, addFinalizer = NA, ...)
{
ans = .Call("RS_XML_clone", node, as.logical(recursive), addFinalizer, PACKAGE = "XML")
addDocFinalizer(ans, addFinalizer)
ans
})
setMethod("xmlClone", "XMLInternalNode",
function(node, recursive = TRUE, addFinalizer = FALSE, ...)
{
ans = .Call("RS_XML_clone", node, as.logical(recursive), addFinalizer, PACKAGE = "XML")
})
ensureNamespace =
#
# Idea is to make certain that the root node has definitions for the specified
# namespaces. The caller specifies the named vector of interest.
# If the URL already exists, we return the corresponding prefix.
#
#
# Returns the prefixes in the documents that correspond to the
# namespace definitions
#
function(doc, what)
{
if(is(doc, "XMLInternalDocument"))
node = xmlRoot(doc)
else
node = doc
defs = xmlNamespaceDefinitions(xmlRoot(doc), simplify = TRUE)
i = match(what, defs)
w = is.na(i)
if(any(w)) {
sapply(names(what)[w], function(id) newXMLNamespace(node, what[id], id))
names(what)[w]
} else
names(defs)[i]
}
"xmlParent<-" =
function(x, ..., value) {
addChildren(value, ..., kids = list(x))
}
setOldClass("XMLNamespaceRef")
setAs("XMLNamespaceRef", "character",
function(from) {
.Call("R_convertXMLNsRef", from, PACKAGE = "XML")
})
xmlSearchNs =
function(node, ns, asPrefix = TRUE, doc = as(node, "XMLInternalDocument"))
{
.Call("R_xmlSearchNs", doc, node, as.character(ns), as.logical(asPrefix), PACKAGE = "XML")
}
XML/R/xmlOutputBuffer.R 0000644 0001751 0000144 00000021220 13607633674 014443 0 ustar hornik users xmlOutputBuffer <-
#
#
# Want to check with the DTD whether a tag is legitimate
# attributes are valid, etc.
#
# Add an indentation level.
#
# Need to escape characters via entities:
# <- => %sgets;
# < => %lt;
# > => %gt;
# etc.
#
#
# Allow xmlEndTag with no argument to allow closing the current one.
# (Maintain a stack)
#
# Allow addTag(tag, addTag(), addTag(),)
#
#
# The handling of the connection (i.e. the buf argument) can
# be cleaned up using the OOP package from Omegahat. This will be done
# in the future.
#
#
# sapply(names(nameSpace), function(i, x){paste("xmlns:",i,"=\"",x[[i]],"\"", sep="")}, x=nameSpace)
#
#
#
function(dtd = NULL, nameSpace = NULL, buf = NULL, nsURI = NULL,
header = "")
{
# If the user gave as an existing buffer and header is non-NULL,
# we appendd it to the buffer. This can be used for adding things
# like section breaks, etc.
#
# If the user did not give us a buffer, then we use the header.
#
# This is done immediately the function is called, rather than
# in the calls to the functions of the closure after it is returned.
if(is.null(buf))
buf <- header
else if(inherits(buf, "connection")) {
if(!isOpen(buf)) {
open(buf, rw = "w")
on.exit(close(buf))
}
cat(header,"\n", sep="", file = buf)
} else if(!is.null(header))
cat(header,"\n", sep="", file = buf)
emittedDocType <- FALSE
if(missing(nameSpace) && !is.null(nsURI) && !is.null(names(nsURI))) {
nameSpace <- names(nsURI)[1]
}
openTags <- NULL #list()
lastTag <- 0
# This is called from addTag() when the tag being
# emitted into the stream is left open by that call.
# We store the tag name, its namespace and the URI of the
# namespace if there is one in this call.
# This triple is appended as the last row of the openTags
# matrix and lastTag is incremented.
addOpenTag <- function(tag, ns, xmlns) {
lastTag <<- lastTag+1
if( lastTag == 1 ) {
rval <- matrix(c(tag, if(is.null(ns)) "" else ns, if(is.null(xmlns)) "" else xmlns),
nrow = 1, dimnames=list(NULL, c("tagname","nsprefix", "nsURI")) )
} else
rval <- rbind(openTags, c(tag, ifelse(is.null(ns),openTags[lastTag-1,2], ns), ifelse(is.null(xmlns),"",xmlns)))
openTags <<- rval
}
checkNamespace <- function(ns) {
return(TRUE)
## Ignored
if( (lastTag == 0) )
stop(paste("Namespace `",ns, "' is not defined\n",sep=""))
m <- match(ns, openTags$nsprefix, NULL)
if( any(!is.null(openTags[m,"nsURI"])) )
return(FALSE)
stop(paste("Namespace:",ns, "is not defined\n",sep=" "))
}
openTag <- function(tag, ..., attrs = NULL, sep = "\n",
namespace = NULL, xmlns = NULL) {
addTag(tag, ..., attrs = attrs, sep = sep, namespace = namespace, xmlns, close = FALSE)
}
# The namespace is the prefix for the tag name.
# For example, if the namespace is shelp and the tag is arg
# the element is shelp:tag.
# In this function, we try to infer the ``current'' namespace
# if the user doesn't specify it. We also have to ensure that
# the namespace has a definition before it is used.
#
# We also need to allow the user specify an empty namespace
# so that tags
addTag <- function(tag, ..., attrs = NULL, sep = "\n", close = TRUE,
namespace = NULL, xmlns = NULL) {
tmp <- ""
# Flag indicating whether this is the very first, top-level tag.
# should be shared across these functions and part of the state of
# the output buffer instance ?
startingTag <- is.null(getOpenTag())
# The user didn't specify a namespace, then we need to check about the xmlns.
# If the user specified that, then there is an inconsistency.
# Otherwise, no namespace and no xmlns. So need to get the
# current nameSpace.
if(is.null(namespace)) {
if( !is.null(xmlns) ) {
# Really want to look this up in the set of "global" namespaces.
if(is.null(names(xmlns)))
stop("you must specify the namespace as well as xmlns")
namespace <- names(xmlns)[1]
}
else {
# so there is no xmlns.
# We need to determine what the currently active
# namespace is.
cur <- getOpenTag()
if(is.null(cur)) {
# Use the default namespace given when the buffer waas constructed
namespace <- nameSpace
# xmlns <- nsURI
} else {
startingTag <- FALSE
namespace <- cur[["nsprefix"]]
}
}
}
# if you remap prefixes this could be a problem
if(!startingTag && !is.null(namespace) && namespace == nameSpace && is.null(xmlns) ) {
tmp1 <- getOpenTag()
if(is.null(tmp1) && !is.null(nsURI)) { # || tmp1[["nsURI"]] != nsURI) {
xmlns <- nsURI[1]
} # else namespace <- NULL
}
#if xmlns is given but not the namespace, then
# check this.
if( !is.null(namespace) && is.null(xmlns) )
checkNamespace(namespace)
if( !is.null(namespace) && !is.null(xmlns) ) {
if(!is.null(names(xmlns))) {
tmpp <- xmlns
names(tmpp) <- paste("xmlns", names(tmpp), sep=":")
attrs <- c(attrs, tmpp)
} else
attrs[[paste("xmlns", namespace, sep=":")]] <- xmlns
}
if(startingTag && !is.null(nsURI)) {
tmpp <- nsURI
names(tmpp) <- paste("xmlns", names(nsURI), sep=":")
attrs <- c(attrs, tmpp)
}
# if the namespace is non-trivial (null or ""), then concatenate with the
# tag name. Also handle the case that this is the starting tag
# and so no namespaces are defined at this point.
# !startingTag &&
tagName <- if(!is.null(namespace) && namespace != "") paste(namespace,tag,sep=":") else tag
if(!is.null(attrs)) {
tmp <- paste(" ", paste(names(attrs),
paste("\"",attrs,"\"", sep=""),sep="=",
collapse=" "),sep="")
}
if(length(dtd) && !emittedDocType) {
add(paste(" 1) paste("PUBLIC", ddQuote(dtd[2])), ">"))
emittedDocType <<- TRUE
}
add(paste("<", tagName, tmp, ">", sep=""))
if(length(list(...)) > 0) {
add(..., sep=sep)
}
if(close)
add(paste(if(sep == "\n")"" else"\n", "",tagName, ">", "\n", sep=""), sep="")
else
addOpenTag(tag, namespace, xmlns)
NULL
}
closeTag <- function(name = NULL, namespace = nameSpace) {
if(is.null(name)) {
tmp <- getOpenTag()
name <- tmp[1]
if(length(tmp)>1)
namespace <- tmp[2]
openTags <<- openTags[-lastTag, ,drop = FALSE]
lastTag <<- lastTag-1
} else if(is.numeric(name)) {
for(i in 1:name)
closeTag()
return()
}
add("", ifelse(!is.null(namespace) && namespace != "", paste(namespace,name,sep=":"), name),">\n", sep="")
}
# This returns the last entry in the matrix openTags
# which should contain the currently open tag, namespace and
# associated URI.
getOpenTag <- function() {
if(lastTag > 0)
openTags[lastTag, ]
else
NULL
}
#
paste0 <- function(..., sep="", collapse="") paste(..., sep = sep, collapse=collapse)
reset <- function() {
buf <<- header
openTags <<- list()
lastTag <<- 0
}
addComment <- function(..., sep="\n") {
add("", sep=sep)
}
add <- function(..., sep="\n") {
if(is.character(buf))
buf <<- paste(buf, paste0(..., collapse=sep), sep=sep)
else
cat(paste0(..., collapse=sep), sep, sep="", file=buf)
}
addCData <- function(text) {
add("", sep="\n")
}
addPI <- function(name, text) {
add("", name, " ", text, "?>\n", sep="")
}
tagString <- function(tag, ..., attrs, close=FALSE) {
tmp <- ""
if(!missing(attrs)) {
tmp <- paste(" ", paste(names(attrs), paste("\"",attrs,"\"", sep=""), sep="=", collapse=" "),sep="")
}
return(paste0("<", tag,tmp, ">",...,"",tag,">"))
}
con <- list( value=function() {buf},
addTag = addTag,
openTag = openTag,
closeTag = closeTag,
addEndTag = closeTag,
reset = reset,
tagString = tagString,
add = add,
addComment = addComment,
addPI = addPI,
addCData = addCData,
getOpenTag=getOpenTag,
addOpenTag=addOpenTag
)
# class(con) <- c("XMLOutputBuffer", "XMLOutputStream")
# con
ans = new("XMLOutputBuffer", con)
names(ans) = names(con)
ans
}
XML/R/libxmlFeatures.R 0000644 0001751 0000144 00000000116 13607633670 014253 0 ustar hornik users
libxmlFeatures =
function()
{
.Call("R_getXMLFeatures", PACKAGE = "XML")
}
XML/R/tree.R 0000644 0001751 0000144 00000007121 13607633674 012233 0 ustar hornik users ## needed for sanity
.children <- .this <- .nodes <- .parents <- NULL
nodeIdGenerator =
#
# Not currently used. See asXMLTreeNode and the alternative default
# argument for XMLHashTree instances which would allow us to use
# this function. But then we'd have to deal XMLFlatListTree differently.
#
function(suggestion = "", env) {
# the check to see if suggestion is a name in env is very expensive? Is it?
if(suggestion == "" || exists(suggestion, env, inherits = FALSE))
as.character(length(objects(env))) # .count + 1)
else
suggestion
}
asXMLTreeNode =
function(node, env,
id = get(".nodeIdGenerator", env)(xmlName(node)), # nodeIdGenerator(xmlName(node), env),
className = "XMLTreeNode")
{
node$id = id
node$env = env
class(node) = c(className, class(node))
node
}
addParentNode =
function(node, kids = character())
{
if(!inherits(node, 'XMLTreeNode')) {
node = asXMLTreeNode(node, .this)
}
id = node$id
.children[[ id ]] <<- kids
.parents[ kids ] <<- id
.nodes[[ id ]] <<- node
id
}
addNode.XMLFlatListTree =
function(node, parent)
{
e = parent$env
if(!("id" %in% names(unclass(node))))
node$id = get(".nodeIdGenerator", e)(xmlName(node))
node$env = parent$env
id = node$id
nodes <- get(".nodes", e)
nodes[[ id ]] <- node
assign(".nodes", nodes, e)
p = get(".parents", e)
p[id] = parent$id
assign(".parents", p, e)
kids = get(".children", e)
kids[[ parent$id ]] <- c(kids[[ parent$id ]] , node$id)
assign(".children", kids, e)
node
}
names.XMLFlatTree =
function(x) {
names(get(".nodes", x))
}
"$.XMLFlatListTree" =
function(x, name) {
get(".nodes", envir = x)[[name]]
}
xmlRoot.XMLFlatTree =
function(x, skip = TRUE, ...)
{
p = get(".parents", x)
#XXX
}
xmlChildren.XMLTreeNode =
function(x, addNames = TRUE, ...)
{
e = x$env
kids = get(".children", e)
nodes = get(".nodes", e)
ans = if(x$id %in% names(kids))
nodes[ kids[[ x$id ]] ]
else
list()
structure(ans, class = "XMLNodeList")
}
if(useS4)
setMethod("xmlChildren", "XMLTreeNode", xmlChildren.XMLTreeNode)
setMethod("xmlParent", "XMLTreeNode",
function(x, ...)
{
p = get(".parents", x$env)
idx = match(x$id, names(p))
if(is.na(idx))
return(NULL)
get(".nodes", x$env)[[ p[x$id] ]]
})
xmlToList =
function(node, addAttributes = TRUE, simplify = FALSE)
{
if(is.character(node))
node = xmlParse(node)
if(inherits(node, "XMLAbstractDocument"))
node = xmlRoot(node)
if(any(inherits(node, c("XMLTextNode", "XMLInternalTextNode"))))
xmlValue(node)
else if(xmlSize(node) == 0)
xmlAttrs(node)
else {
if(is.list(node)) { # inherits(node, "XMLAbstractNode"))
tmp = vals = xmlSApply(node, xmlToList, addAttributes)
tt = xmlSApply(node, inherits, c("XMLTextNode", "XMLInternalTextNode"))
} else {
tmp = vals = (if(simplify) xmlSApply else xmlApply)(node, xmlToList, addAttributes)
tt = xmlSApply(node, inherits, c("XMLTextNode", "XMLInternalTextNode"))
}
vals[tt] = (if(simplify) sapply else lapply)(vals[tt], function(x) x[[1]])
if(length(attrs <- xmlAttrs(node)) > 0) {
if(addAttributes)
vals[[".attrs"]] = attrs
else
attributes(vals) = as.list(attrs)
}
if(any(tt) && length(vals) == 1)
vals[[1]]
else
vals
}
}
indexOfNode =
#XXX Do this for hash trees.
function(x)
{
if(!inherits(x, "XMLInternalNode"))
stop("must be an internal node")
.Call("R_XML_indexOfChild", x, PACKAGE = "XML")
}
XML/R/readHTMLTable.R 0000644 0001751 0000144 00000022073 13610555150 013631 0 ustar hornik users
trim =
function(x)
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
textNodesOnly =
# Only process the top-level text nodes, not recursively.
# Could be done as simply as
# xmlValue(x, recursive = FALSE)
function(x)
paste(xmlSApply(x, function(n) if(is(n, "XMLInternalTextNode")) xmlValue(n) else ""), collapse = "")
toNumber =
function(x)
{
as.numeric(gsub("[%,]", "", x))
}
if(FALSE) {
doc = htmlParse("http://elections.nytimes.com/2008/results/states/president/california.html")
tbls = getNodeSet(doc, "//table[not(./tbody)]|//table/tbody")
o = readHTMLTable(tbls[[1]], skip.rows = c(1, Inf), header = FALSE, colClasses = c("character", replicate(5, toNumber)), elFun = textOnly)
o = readHTMLTable("http://elections.nytimes.com/2008/results/states/president/california.html")
x = readHTMLTable("http://www.usatoday.com/news/politicselections/vote2004/CA.htm", as.data.frame = FALSE)
}
setGeneric("readHTMLTable",
function(doc, header = NA,
colClasses = NULL, skip.rows = integer(), trim = TRUE, elFun = xmlValue,
as.data.frame = TRUE, which = integer(), ...)
standardGeneric("readHTMLTable"))
setMethod("readHTMLTable", "character",
function(doc, header = NA,
colClasses = NULL, skip.rows = integer(), trim = TRUE, elFun = xmlValue,
as.data.frame = TRUE, which = integer(), encoding = character(), ...) {
pdoc = htmlParse(doc, encoding = encoding)
readHTMLTable(pdoc, header, colClasses, skip.rows, trim, elFun, as.data.frame, which, ...)
})
# XXX Should vectorize in header, colClasses, i.e. allow different values for different tables.
setMethod("readHTMLTable", "HTMLInternalDocument",
function(doc, header = NA,
colClasses = NULL, skip.rows = integer(), trim = TRUE, elFun = xmlValue,
as.data.frame = TRUE, which = integer(), ...)
{
# tbls = getNodeSet(doc, "//table[not(./tbody)]|//table/tbody")
tbls = getNodeSet(doc, "//table") # XXX probably want something related to nested tables
# "//table[not(ancestor::table)]" -> outer ones
# if header is missing, compute it each time.
if(length(which))
tbls = tbls[which]
# ans = lapply(tbls, readHTMLTable, header, colClasses, skip.rows, trim, elFun, as.data.frame, ...)
header = rep(header, length = length(tbls))
ans = mapply(readHTMLTable,
tbls, header,
MoreArgs = list(colClasses = colClasses, skip.rows = skip.rows, trim = trim, elFun = elFun, as.data.frame = as.data.frame, ...),
SIMPLIFY = FALSE)
names(ans) = sapply(tbls, getHTMLTableName)
if(length(which) && length(tbls) == 1)
ans[[1]]
else
ans
})
getHTMLTableName =
function(node)
{
id = xmlGetAttr(node, "id")
if(!is.null(id))
return(id)
cap = getNodeSet(node, "./caption")
if(length(cap))
return(xmlValue(cap[[1]]))
}
setClass("FormattedNumber", contains = "numeric")
setClass("FormattedInteger", contains = "integer")
setAs('character', 'FormattedNumber', function(from) as.numeric(gsub(",", "", from)))
setAs('character', 'FormattedInteger', function(from) as.integer(gsub(",", "", from)))
setClass("Currency", contains = "numeric")
setAs("character", "Currency",
function(from)
as.numeric(gsub("[$,]", "", from)))
setClass("Percent", contains = "numeric")
setAs('character', 'Percent', function(from) as.numeric(gsub("%", "", from)))
setMethod("readHTMLTable", "XMLInternalElementNode",
#readHTMLTable.XMLInternalElementNode =
#
#
# header is computed based on whether we have a table node and it has a thead.
# (We don't currently bother with the col spans.)
#
# colClasses can be a character vector giving the name of the type for a column,
# an NULL to drop the corresponding column, or a function in which case it will
# be passed the contents of the column and can transform it as it wants.
# This allows us to clean text before converting it.
#
# skip.rows - an integer vector indicating which rows to ignore.
#
# trim - a logical indicating whether to trim white space from the start and end of text.
#
# elFun - a function which is called to process each th or td node to extract the content.
# This is typically xmlValue, but one can supply others (e.g. textNodesOnly)
# as.data.frame
#
function(doc, header = NA ,
colClasses = NULL, skip.rows = integer(), trim = TRUE, elFun = xmlValue,
as.data.frame = TRUE, encoding = 0L, ...)
{
node = doc
headerFromTable = FALSE
dropFirstRow = FALSE
# check if we have a header
if(length(header) == 1 && is.na(header)) # this node was doc
header = (xmlName(doc) %in% c("table", "tbody") &&
("thead" %in% names(doc) || length(getNodeSet(node, "./tr[1]/th | ./tr[1]/td")) > 0))
if(is.logical(header) && (is.na(header) || header) && xmlName(node) == "table") {
if("thead" %in% names(node))
header = node[["thead"]]
else {
if("tr" %in% names(node))
tmp = node[["tr"]]
else
tmp = node[["tbody"]][["tr"]]
if(!is.null(tmp) && all(names(tmp) %in% c('text', 'th'))) {
header = xpathSApply(tmp, "./th | ./td", xmlValue, encoding = encoding)
dropFirstRow = TRUE
}
}
}
# Moved this from before the check for header as we set node here and that seems
# premature. Checked on
# readHTMLTable("http://www.google.com/finance?q=NASDAQ:MSFT&fstype=ii", header = TRUE, which = 1)
tbody = getNodeSet(node, "./tbody")
if(length(tbody))
node = tbody[[1]]
if(is(header, "XMLInternalElementNode")) {
# get the last tr in the thead
if(xmlName(header) == "thead") {
i = which(names(header) == "tr")
header = header[[ i [ length(i) ] ]]
xpath = "./th | ./td"
} else
xpath = "./*/th | ./*/td"
header = as.character(xpathSApply(header, xpath, elFun, encoding = encoding))
headerFromTable = TRUE
if(xmlName(node) == "table" && "tbody" %in% names(node))
node = node[["tbody"]]
}
# Process each row, by getting the content of each "cell" (th/td)
rows = getNodeSet(node, ".//tr")
if(dropFirstRow)
rows = rows[-1]
els = lapply(rows, function(row) {
tmp = xpathSApply(row, "./th|./td", elFun)
if(trim)
trim(tmp)
else
tmp
})
# spans = getNodeSet(node, ".//td[@rowspan] | .//th[@rowspan]")
if(length(skip.rows)) {
infs = (skip.rows == Inf)
if(any(infs))
# want Inf - 2, Inf - 1, Inf, to indicate drop last 3, but that won't work
# take sequence of Inf to identify Inf - 2, Inf - 1, Inf
skip.rows[skip.rows == Inf] = length(els) - seq(0, length = sum(infs))
els = els[ - skip.rows ]
}
if(length(els) == 0)
return(NULL)
numEls = sapply(els, length)
# els[[1]] should be a scalar
if(is.logical(header) && !is.na(header) && header && any(nchar(els[[1]]) < 999)) {
header = els[[1]]
els = els[-1]
numEls = numEls[ - 1]
}
if(length(els) == 0)
return(NULL) #XXX we should have a header here so return a data frame with 0 rows.
ans = lapply(seq(length = max(numEls)),
function(col) {
sapply(els, `[`, col)
})
if(is.character(header) && length(header) == length(ans))
names(ans) = header
if(length(colClasses)) {
colClasses = rep(colClasses, length = length(ans))
n = sapply(colClasses, function(x) is.null(x) || x == "NULL")
if(any(n)) {
ans = ans[ ! n ]
colClasses = colClasses[ ! n ]
}
ans = lapply(seq(along = ans) ,
function(i)
if(is.function(colClasses[[i]]))
colClasses[[i]](ans[[i]])
else if(colClasses[[i]] == "factor")
factor(ans[[i]])
else if(colClasses[[i]] == "ordered")
ordered(ans[[i]])
else
as(ans[[i]], colClasses[[i]])
)
}
if(as.data.frame) {
ans = as.data.frame(ans, ...)
if(is.character(header) && length(header) == length(ans))
names(ans) = header
else if(nrow(ans) > 0)
names(ans) = paste("V", seq(along = ans), sep = "")
}
ans
})
getTableWithRowSpan =
function(node, r = xmlSize(node),
c = max(xmlSApply(node, function(x) length(getNodeSet(x, "./td | ./th")))),
encoding = 0L)
{
ans = matrix(NA_character_, r, c)
for(i in seq(length = r)) {
col = 1
kids = getNodeSet(node[[i]], "./th | ./td")
for(k in seq(along = kids)) {
sp = xmlGetAttr(k, "rowspan", 1)
ans[seq(i, length = sp)] = xmlValue(k, encoding = encoding)
}
}
}
XML/R/keyValueDB.R 0000644 0001751 0000144 00000004676 13607633705 013276 0 ustar hornik users # All the possible names, empirically
# table(xpathSApply(it, "//*", xmlName))
# On a file that has 549,286 nodes (JasonLibrary.xml)
# this took 37 seconds to process the entire file.
#
# array data date dict false integer key plist string
# 5 14 22104 33103 3 133465 263594 1 96631
# true
# 366
setGeneric("readKeyValueDB",
function(doc, ...)
standardGeneric("readKeyValueDB"))
setMethod("readKeyValueDB", "character",
function(doc, ...)
readKeyValueDB(xmlParse(doc), ...))
setMethod("readKeyValueDB", "AsIs",
function(doc, ...)
readKeyValueDB(xmlParse(doc), ...))
setMethod("readKeyValueDB", "XMLInternalDocument",
function(doc, ...)
readKeyValueDB(xmlRoot(doc)[["dict"]], ...))
setMethod("readKeyValueDB", "XMLInternalNode",
function(doc, dropComments = TRUE, ...) {
kids = xmlChildren(doc)
if(dropComments)
kids = kids[!sapply(kids, is, "XMLInternalCommentNode")]
if(length(kids) == 0)
return(list())
i = seq(by = 2, length = length(kids)/2)
keys = sapply(kids[ i ], xmlValue)
structure(lapply(kids[i + 1], readPlistNodeValue, dropComments = dropComments, ...),
names = keys)
})
readPlistNodeValue =
function(node, dropComments = TRUE)
{
id = xmlName(node)
switch(id, integer = if(abs(tmp <- as.numeric(xmlValue(node))) > .Machine$integer.max) tmp else as.integer(xmlValue(node)),
string = xmlValue(node),
data = xmlValue(node),
key = xmlValue(node),
dict = readKeyValueDB(node),
true = TRUE,
false = FALSE,
date = as.POSIXct(strptime(xmlValue(node), "%Y-%m-%dT%H:%M:%SZ")),
array = { tmp = if(dropComments)
node[!xmlSApply(node, inherits, "XMLInternalCommentNode")]
else
xmlChildren(node)
# We might want lapply/xmlApply() so as to allow different types with classes,
# e.g. POSIXct that doesn't collapse down to a string or whatever.
sapply(tmp, readPlistNodeValue)
}
)
}
readPlist = readKeyValueDB
XML/R/supports.R.in 0000644 0001751 0000144 00000000240 13607633674 013573 0 ustar hornik users supportsExpat <-
function()
{
@SUPPORTS_EXPAT@
}
supportsLibxml <-
function()
{
@SUPPORTS_LIBXML@
}
ADD_XML_OUTPUT_BUFFER = @ADD_XML_OUTPUT_BUFFER@ > 0
XML/R/DTD.R 0000644 0001751 0000144 00000005270 13607633666 011713 0 ustar hornik users dtdIsAttribute <-
function(name, element, dtd)
{
if(!inherits(element,"XMLElementDef")) {
element <- dtdElement(as.character(element), dtd)
}
# return(!is.na(amatch(name, names(element$attributes))))
return(!is.na(match(name, names(element$attributes))))
}
dtdValidElement <-
#
# checks whether an XML element named `name'
# can be inserted into an element named `within'
# as defined in the specific DTD, optionally
# specifying the position the `name' element would
# be added.
#
# Ideally, this would be used when writing to an XML stream
# (doesn't exist in R or S, yes).
# The stream would monitor the currently open tags
# (as a stack) and would be able to test whether a new
# insertion was valid.
function(name, within, dtd, pos=NULL)
{
el <- dtdElement(within, dtd)
if(is.null(el))
stop(paste("No such element \"",within,"\" in DTD",sep="", collapse=""))
return(dtdElementValidEntry(el, name,pos=pos))
}
dtdElementValidEntry <-
function(element, name, pos=NULL)
{
UseMethod("dtdElementValidEntry", element) # , name, pos)
}
dtdElementValidEntry.XMLElementDef <-
function(element, name, pos=NULL)
{
return(dtdElementValidEntry(element$contents,name,pos=pos))
}
dtdElementValidEntry.XMLOrContent <-
function(element, name, pos=NULL)
{
for(i in element$elements) {
if(dtdElementValidEntry(i, name, pos=pos))
return(TRUE)
}
return(FALSE)
}
dtdElementValidEntry.XMLElementContent <-
function(element, name, pos=NULL)
{
# if there are no sub-element types, then can't be here.
# Might check this is a PCDATA by looking at the type.
if(is.null(element$elements)) {
return(FALSE)
}
return( any(element$elements == name) )
}
dtdElementValidEntry.character <-
function(element, name, pos=NULL)
{
return(element == name)
}
dtdElementValidEntry.XMLSequenceContent <-
function(element, name, pos=NULL)
{
if(!is.null(pos)) {
tmp <- element$elements[[as.integer(pos)]]
if(!is.null(tmp))
return(dtdElementValidEntry(tmp))
else
return(FALSE)
}
for(i in element$elements) {
if(dtdElementValidEntry(i, name)) {
return(TRUE)
}
}
return(FALSE)
}
xmlContainsEntity <-
#
# Determine if a particular entity is defined
# within the DTD.
#
function(name, dtd)
{
return(!is.na(match(name,dtd$entities)))
}
xmlContainsElement <-
#
# Determine if a particular entity is defined
# within the DTD.
#
function(name, dtd)
{
return(!is.na(match(name,dtd$element)))
}
dtdEntity <-
#
# Retrieves the specified entity from the DTD definition.
# Uses the `dtd$entitities' list.
#
function(name, dtd)
{
dtd$entities[[name]]
}
dtdElement <-
#
# Retrieves the specified element from the DTD definition.
# Uses the `dtd$elements' list.
function(name, dtd)
{
dtd$elements[[name]]
}
XML/R/saveXML.R 0000644 0001751 0000144 00000012102 13607633674 012606 0 ustar hornik users if(FALSE) {
saveXML <-
function(doc, file=NULL, compression=0, indent=TRUE, prefix = '\n',
doctype = NULL, encoding = getEncoding(doc), ...)
{
UseMethod("saveXML")
}
}
saveXML.XMLInternalNode <-
function(doc, file = NULL, compression = 0, indent = TRUE, prefix = '\n',
doctype = NULL, encoding = getEncoding(doc), ...)
{
if(is.na(encoding) || length(encoding) == 0 || encoding == "")
encoding = character()
ans = .Call("RS_XML_printXMLNode", doc, as.integer(0), as.integer(indent), as.logical(indent),
as.character(encoding), getEncodingREnum(as.character(encoding)), PACKAGE = "XML")
if(length(file)) {
cat(ans, file = file)
file
} else
ans
}
saveXML.XMLInternalDocument <-
function(doc, file = NULL, compression = 0, indent = TRUE,
prefix = '\n', doctype = NULL, encoding = getEncoding(doc), ...)
{
havePrefix = !missing(prefix)
isDocType = is(doctype, "Doctype")
if(isDocType) {
# Check that the value in the DOCTYPE for the top-level name is the same as that of the
# root element
topname = xmlName(xmlRoot(doc))
if(doctype@name == "")
doctype@name = topname
else if(topname == doctype@name)
stop("The top-level node and the name for the DOCTYPE must agree", doctype@name, " ", topname)
prefix = c(doctype@name, doctype@public, doctype@system)
}
if(length(file))
file = path.expand(file)
if(is.na(encoding))
encoding = "" #character()
ans = .Call("R_saveXMLDOM", doc, file, as.integer(compression), as.logical(indent),
if(is.character(prefix)) prefix else character(),
as.character(encoding), # getEncodingREnum(as.character(encoding)),
PACKAGE = "XML")
if(!isDocType && havePrefix) {
prefix = as(prefix, "character") # allow for an XMLInternalNode.
if(length(file)) {
txt = c(prefix, readLines(file)[-1])
cat(txt, file = file)
} else {
tmp = strsplit(ans, "\\\n")[[1]]
tmp = c(prefix, tmp[-1])
ans = paste(tmp, collapse = "\n")
}
}
if(length(file))
file
else
ans
}
saveXML.XMLInternalDOM <-
function(doc, file=NULL, compression=0, indent=TRUE, prefix = '\n',
doctype = NULL, encoding = getEncoding(doc), ...)
{
saveXML(doc$value(), file, compression, indent, prefix, doctype, encoding)
}
saveXML.XMLOutputStream =
function(doc, file = NULL, compression = 0, indent = TRUE, prefix = '\n',
doctype = NULL, encoding = getEncoding(doc), ...)
{
saveXML(doc$value(), file, compression, indent, prefix, doctype, encoding)
}
saveXML.sink =
#
# Need to handle a DTD here as the prefix argument..
#
function(doc, file = NULL, compression = 0, indent = TRUE, prefix = '\n',
doctype = NULL, encoding = getEncoding(doc), ...)
{
asString = is.null(file)
if(asString)
file = textConnection(NULL, "w")
if(inherits(file, c("character", "connection"))) {
sink(file)
on.exit(sink())
}
if(asString)
on.exit(close(file), add = TRUE)
if(!is.null(prefix))
cat(as.character(prefix))
if(!is.null(doctype))
cat(as(doctype, "character"), '\n')
#XXX Should we return file if it is not missing() || NULL ???
print(doc)
if(asString)
paste(textConnectionValue(file), collapse = "\n")
else
file
}
saveXML.XMLNode = saveXML.sink
saveXML.XMLFlatTree = saveXML.sink
setGeneric("saveXML",
function(doc, file=NULL, compression=0, indent=TRUE, prefix = '\n',
doctype = NULL, encoding = getEncoding(doc), ...)
standardGeneric("saveXML"))
setMethod("saveXML", "XMLInternalNode", saveXML.XMLInternalNode)
setMethod("saveXML", "XMLInternalDocument", saveXML.XMLInternalDocument)
setMethod("saveXML", "XMLInternalDOM", saveXML.XMLInternalDOM)
setMethod("saveXML", "XMLOutputStream", saveXML.XMLOutputStream)
setMethod("saveXML", "XMLNode", saveXML.sink)
setOldClass("XMLFlatTree")
setOldClass(c("XMLFlatListTree", "XMLFlatTree"))
setMethod("saveXML", "XMLFlatTree", saveXML.sink)
setMethod("saveXML", "HTMLInternalDocument",
function(doc, file = NULL, compression = 0, indent = TRUE,
prefix = '\n', doctype = NULL, encoding = "", ...) {
if(ADD_XML_OUTPUT_BUFFER) {
if(length(file) && is.character(file))
out = file
else
out = tempfile()
} else
out = character()
ans = .Call("RS_XML_dumpHTMLDoc", doc, as.integer(indent), as.character(encoding),
as.logical(indent), as.character(out), PACKAGE = "XML")
if(length(file) && length(out) == 0) {
cat(ans, file = file)
file
} else if(length(out)) {
paste(readLines(out), collapse = "\n")
} else {
ans
}
})
XML/R/summary.R 0000644 0001751 0000144 00000003260 13607633667 012773 0 ustar hornik users
xmlElementSummaryHandlers =
#
# Functions for the event parser that we can use
# to count the occurrences of each tag and the attributes
function(file = character(), countAttributes = TRUE)
{
# Collect a list of attributes for each element.
tags = list()
# frequency table for the element names
counts = integer()
start =
function(name, attrs, ...) {
if(name == "xi:include") {
# need to handle the xpointer piece
# and the relative path names - done with getRelativeURL
href = getRelativeURL(attrs['href'], dirname(file), sep = .Platform$file.sep)
xmlElementSummary(href, funs)
}
if(!countAttributes)
tags[[name]] <<- unique(c(names(attrs), tags[[name]]))
else {
x = tags[[name]]
i = match(names(attrs), names(x))
if(any(!is.na(i)))
x[i[!is.na(i)]] = x[i[!is.na(i)]] + 1
if(any(is.na(i)))
x[names(attrs)[is.na(i)]] = 1
tags[[name]] <<- x
}
counts[name] <<- if(is.na(counts[name])) 1 else counts[name] + 1
}
funs =
list(.startElement = start,
.getEntity = function(x, ...) "xxx",
.getParameterEntity = function(x, ...) "xxx",
result = function() list(nodeCounts = sort(counts, decreasing = TRUE), attributes = tags))
}
xmlElementSummary =
function(url, handlers = xmlElementSummaryHandlers(url))
{
handlers
if(file.exists(url) && file.info(url)[1, "isdir"])
url = list.files(url, pattern = "\\.xml$", full.names = TRUE)
if(length(url) > 1)
lapply(url, xmlElementSummary, handlers)
else
xmlEventParse(url, handlers, replaceEntities = FALSE)
handlers$result()
}
XML/R/parser.R 0000644 0001751 0000144 00000003411 14216306410 012544 0 ustar hornik users # cat(paste(names(parserOptions), paste(2, 0:(length(parserOptions) - 1), sep = "^"), sep = " = ", collapse = "\n"))
#setClass("XMLParserOption", "EnumValue")
parserOptions =
structure(c(1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048,
4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576
), .Names = c("RECOVER", "NOENT", "DTDLOAD", "DTDATTR", "DTDVALID",
"NOERROR", "NOWARNING", "PEDANTIC", "NOBLANKS", "SAX1", "XINCLUDE",
"NONET", "NODICT", "NSCLEAN", "NOCDATA", "NOXINCNODE", "COMPACT",
"OLD10", "NOBASEFIX", "HUGE", "OLDSAX"))
RECOVER = 2^0
NOENT = 2^1
DTDLOAD = 2^2
DTDATTR = 2^3
DTDVALID = 2^4
NOERROR = 2^5
NOWARNING = 2^6
PEDANTIC = 2^7
NOBLANKS = 2^8
SAX1 = 2^9
XINCLUDE = 2^10
NONET = 2^11
NODICT = 2^12
NSCLEAN = 2^13
NOCDATA = 2^14
NOXINCNODE = 2^15
COMPACT = 2^16
OLD10 = 2^17
NOBASEFIX = 2^18
HUGE = 2^19
OLDSAX = 2^20
## This can (and does) send illegal file names to file.exists(), which
## has been protected against that but should really be caught here.
xmlParseDoc =
function(file, options = 1L, encoding = character(), asText = !file.exists(file),
baseURL = file)
{
if(is.character(options)) {
i = pmatch(options, names(parserOptions))
if(any(is.na(i)))
stop("unrecognized XML parser options: ", paste(options[is.na(i)], collapse = ", "))
options = parserOptions[i]
} else {
if(!all(options %in% parserOptions))
stop("unrecognized XML parser options: ", paste(options[!(options %in% parserOptions)], collapse = ", "))
}
options = as.integer(sum(options))
if(asText)
.Call("R_xmlReadMemory", file, nchar(file), as.character(encoding), options, as.character(baseURL), PACKAGE = "XML")
else
.Call("R_xmlReadFile", path.expand(file), as.character(encoding), options, PACKAGE = "XML")
}
XML/R/reflection.R 0000644 0001751 0000144 00000012213 13610555150 013406 0 ustar hornik users # These are functions that examine an XML node and
# defines a class for each complex type.
#
# Need to make this work recursively
#
xmlToS4List =
function(from, class = xmlName(from), type = gsub("s$", "", xmlName(from)))
{
new(class, xmlApply(from, as, type))
}
setGeneric("xmlToS4",
function(node, obj = new(xmlName(node)), ...)
standardGeneric("xmlToS4")
)
setMethod("xmlToS4", "XMLInternalNode",
function(node, obj = new(xmlName(node)), ...)
{
if(is(obj, "character") && !isS4(obj))
obj = new(obj)
# if(xmlSize(node) == 1 && node[[1]])
# return(as())
ids = names(node)
nodes = xmlChildren(node)
obj = addXMLAttributes(obj, xmlAttrs(node, addNamespacePrefix = TRUE))
slotIds = slotNames(obj)
slots = getClass(class(obj))@slots
if(any(duplicated(ids))) {
# experimenting with a different way of doing this.
# Group the nodes with the same names and the process those.
groupedNodes = split(nodes, ids)
ids = intersect(names(groupedNodes), slotIds)
for(i in ids) {
tmp = groupedNodes[[i]]
slot = slots[[i]]
if(length(tmp) > 1) {
val = lapply(tmp, convertNode, slot)
val = if(isAtomicType(slot))
unlist(val)
else
as(val, slot) # may be a specific sub-type of list
} else {
el = tmp[[1]]
val = convertNode(el, slot)
}
slot(obj, i) <- val
}
} else {
# This was the original mechanism but it doesn't handle multiple nodes of the same name.
for(i in seq(along = nodes)) {
if(ids[i] %in% slotIds) {
val = if(slots[[ids[i]]] == "character")
xmlValue(nodes[[i]])
else
tryCatch(as(nodes[[i]], slots[[ids[i]]]),
error = function(e)
xmlToS4(nodes[[i]]))
slot(obj, ids[i]) <- val # xmlToS4(nodes[[i]])
}
# obj = addAttributes(obj, xmlAttrs(nodes[[i]]))
}
}
obj
})
convertNode =
function(el, slot)
{
if(slot == "character")
xmlValue(el)
else
tryCatch(as(el, slot),
error = function(e)
xmlToS4(el))
}
isAtomicType =
#
# check if className refers to a primitive/atomic type
# or not.
function(className)
{
atomicTypes = c("logical", "integer", "numeric", "character")
if(className %in% atomicTypes)
return(TRUE)
k = getClassDef(className)
length(intersect(names(k@contains), atomicTypes)) > 0
}
addXMLAttributes =
function(obj, attrs)
{
slots = getClass(class(obj))@slots
i = match(names(attrs), names(slots))
# handle any namespace prefix
if(any(is.na(i))) {
w = grepl(":", names(attrs)) & is.na(i)
if(any(w))
i[which(w)] = match(gsub(".*:", "", names(attrs)[which(w)]), names(slots))
}
m = i
if(any(!is.na(i))) {
vals = structure(attrs[!is.na(i)], names = names(slots)[i [!is.na(i)] ])
for(i in names(vals))
slot(obj, i) <- as(vals[i], slots[[i]])
}
obj
}
makeClassTemplate =
#
# Get the class representation information to represent the contents of
# an XML node.
#
#
function(xnode, types = character(), default = "ANY", className = xmlName(xnode),
where = globalenv())
{
user.types = types
slots = names(xnode)
types =
xmlSApply(xnode, function(x) {
if(xmlSize(x) == 0)
default
else if(xmlSize(x) == 1 || is(x, "XMLInternalTextNode"))
"character"
else
xmlName(x)
})
names(types) = slots
types[names(xmlAttrs(xnode))] = "character"
if(length(user.types))
types[names(user.types)] = user.types
coerce = sprintf("setAs('XMLAbstractNode', '%s', function(from) xmlToS4(from))", className)
def = if(length(types))
sprintf("setClass('%s',\n representation(%s))", className,
paste(sQuote(names(types)), sQuote(types), sep = " = ", collapse = ",\n\t"))
else
sprintf("setClass('%s')", className)
if(!is.null(where) && !(is.logical(where) && !where)) {
eval(parse(text = def), envir = where)
eval(parse(text = coerce), envir = where)
}
list(name = className, slots = types,
def = def, coerce = coerce)
}
setAs("XMLAbstractNode", "integer",
function(from)
as.integer(xmlValue(from)))
setAs("XMLAbstractNode", "numeric",
function(from)
as.numeric(xmlValue(from)))
setAs("XMLAbstractNode", "character",
function(from)
xmlValue(from))
setAs("XMLAbstractNode", "URL",
function(from)
new("URL", xmlValue(from)))
setAs("XMLAbstractNode", "logical",
function(from)
as.logical(xmlValue(from)))
setAs("XMLAbstractNode", "Date",
function(from)
as.Date(xmlValue(from), "%Y-%m-%d"))
setAs("XMLAbstractNode", "POSIXct",
function(from)
as.POSIXct(strptime(xmlValue(from), "%Y-%m-%d %H:%M:%S")))
makeXMLClasses =
function(doc, omit = character(), eval = FALSE)
{
a = getNodeSet(doc, "//*")
ids = unique(sapply(a, xmlName))
if(length(omit))
ids = setdiff(ids, omit)
lapply(ids, function(id) makeClassTemplate(getNodeSet(doc, sprintf("//%s", id))[[1]], where = eval))
}
XML/R/assignChild.R 0000644 0001751 0000144 00000002277 14405636156 013526 0 ustar hornik users "[<-.XMLNode" <-
function(x, i, value)
{
x$children[i] <- value
if(!is.character(i)) {
names(x$children)[i] =
if(inherits(value, "XMLNode"))
xmlName(value)
else
sapply(value, xmlName)
}
x
}
"[[<-.XMLNode" <-
function(x, i, value)
{
x$children[[i]] <- value
if(!is.character(i)) {
names(x$children)[i] =
if(inherits(value, "XMLNode"))
xmlName(value)
else
sapply(value, xmlName)
}
x
}
append.xmlNode <-
function(to, ...)
{
UseMethod("append.xmlNode")
}
append.XMLNode <-
function(to, ...)
{
args <- list(...)
if(!inherits(args[[1]], "XMLNode") && is.list(args[[1]]))
args <- args[[1]]
idx <- seq(length(to$children) + 1, length=length(args))
args = addNames(args)
if(is.null(to$children))
to$children <- args
else {
to$children[idx] <- args
names(to$children)[idx] <- names(args)
}
to
}
append.xmlNode.default <-
function(to, ...)
base::append(to, ...)
if(FALSE) {
xmlAddChild <-
function(node, child) {
node$children <- append(node$children, list(child))
names(node$children) <- sapply(node$children,xmlName)
node
}
}
XML/R/SAXMethods.R 0000644 0001751 0000144 00000010407 13610046416 013236 0 ustar hornik users .InitSAXMethods <-
# Defines S4 classes for use with the SAX parser and specifically to do with the
# state variable.
# This also defines methods for the
function(where = "package:XML") {
# require(methods)
setClass("SAXState", "VIRTUAL", where = where)
setGeneric("startElement.SAX", function(name, atts, .state = NULL) standardGeneric("startElement.SAX"), where = where)
setGeneric("endElement.SAX", function(name, .state = NULL) { standardGeneric("endElement.SAX")}, where = where)
setGeneric("comment.SAX", function(content, .state = NULL) { standardGeneric("comment.SAX")}, where = where)
# Note that we drop the . here.
setGeneric("text.SAX", function(content, .state = NULL) { standardGeneric("text.SAX")}, where = where)
setGeneric("cdata.SAX", function(content, .state = NULL) { standardGeneric("cdata.SAX")}, where = where)
setGeneric("processingInstruction.SAX", function(target, content, .state = NULL) { standardGeneric("processingInstruction.SAX")}, where = where)
setGeneric("entityDeclaration.SAX", function(name, base, sysId, publicId, notationName, .state = NULL) { standardGeneric("entityDeclaration.SAX")}, where = where)
setMethod("startElement.SAX", signature(.state = "SAXState"),
function(name, atts, .state = NULL) .state, where = where)
setMethod("endElement.SAX", signature(.state = "SAXState"),
function(name, .state = NULL) .state, where = where)
setMethod("comment.SAX", signature(.state = "SAXState"),
function(content, .state = NULL) .state, where = where)
setMethod("text.SAX", signature(.state = "SAXState"),
function(content, .state = NULL) .state, where = where)
setMethod("processingInstruction.SAX", signature(.state = "SAXState"),
function(target, content, .state = NULL) .state, where = where)
setMethod("entityDeclaration.SAX", signature(.state = "SAXState"),
function(name, base, sysId, publicId, notationName, .state = NULL) .state, where = where)
return(TRUE)
}
.useNamespacesInXMLPackage = FALSE
if(!.useNamespacesInXMLPackage) {
setClass("SAXState", "VIRTUAL")
setGeneric("startElement.SAX", function(name, atts, .state = NULL) standardGeneric("startElement.SAX"))
setGeneric("endElement.SAX", function(name, .state = NULL) { standardGeneric("endElement.SAX")})
setGeneric("comment.SAX", function(content, .state = NULL) { standardGeneric("comment.SAX")})
# Note that we drop the . here.
setGeneric("text.SAX", function(content, .state = NULL) { standardGeneric("text.SAX")})
setGeneric("processingInstruction.SAX", function(target, content, .state = NULL) { standardGeneric("processingInstruction.SAX")})
setGeneric("entityDeclaration.SAX", function(name, base, sysId, publicId, notationName, .state = NULL) { standardGeneric("entityDeclaration.SAX")})
setMethod("startElement.SAX", signature(.state = "SAXState"),
function(name, atts, .state = NULL) .state)
setMethod("endElement.SAX", signature(.state = "SAXState"),
function(name, .state = NULL) .state)
setMethod("comment.SAX", signature(.state = "SAXState"),
function(content, .state = NULL) .state)
setMethod("text.SAX", signature(.state = "SAXState"),
function(content, .state = NULL) .state)
setMethod("processingInstruction.SAX", signature(.state = "SAXState"),
function(target, content, .state = NULL) .state)
setMethod("entityDeclaration.SAX", signature(.state = "SAXState"),
function(name, base, sysId, publicId, notationName, .state = NULL) .state)
}
genericSAXHandlers <-
function(include, exclude, useDotNames = FALSE)
{
if(!exists("startElement.SAX"))
stop("You must call .InitSAXMethods before calling genericSAXHandlers()n")
ans <- list(startElement = startElement.SAX,
endElement = endElement.SAX,
comment = comment.SAX,
text = text.SAX,
processingInstruction = processingInstruction.SAX,
entityDeclaration = entityDeclaration.SAX)
if(!missing(include))
ans <- ans[include]
else if(!missing(exclude)) {
which <- match(exclude, names(ans))
ans <- ans[-which]
}
if(useDotNames)
names(ans) = paste(".", names(ans), sep = "")
ans
}
XML/R/htmlParse.R 0000644 0001751 0000144 00000013645 13607633667 013245 0 ustar hornik users isURL =
function(file)
{
is.character(file) && grepl("^(http|ftp)", file)
}
############
#XXXXXXXXX
# This is now replaced by copying xmlTreeParse.
htmlTreeParse <-
#
# HTML parser that reads the entire `document' tree into memory
# and then converts it to an R/S object.
# Uses the libxml from Daniel Veillard at W3.org.
#
# asText treat the value of file as XML text, not the name of a file containing
# the XML text, and parse that.
# See also xml
#
function(file, ignoreBlanks = TRUE, handlers = NULL,
replaceEntities = FALSE, asText = inherits(file, "AsIs") || !isURL && grepl("^<", file), # could have a BOM
trim = TRUE,
isURL = is.character(file) && grepl("^(http|ftp)", file),
asTree = FALSE, useInternalNodes = FALSE,
encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0,
xinclude = FALSE, addFinalizer = TRUE, error = function(...){},
options = integer(), parentFirst = FALSE)
{
if(TRUE)
{
doc = xmlTreeParse(file, ignoreBlanks, handlers, replaceEntities, asText, trim, validate = FALSE,
getDTD = FALSE, isURL, asTree, addAttributeNamespaces = FALSE,
useInternalNodes, isSchema = FALSE, fullNamespaceInfo = FALSE,
encoding, useDotNames, xinclude, addFinalizer, error, isHTML = TRUE, options = options)
class(doc) = c("HTMLInternalDocument", class(doc)[1])
return(doc)
}
if(length(file) > 1) {
file = paste(file, collapse = "\n")
if(!missing(asText) && !asText)
stop("multiple URIs passed to xmlTreeParse. If this is the content of the file, specify asText = TRUE")
asText = TRUE
}
if(missing(asText) && substring(file, 1, 1) == "<")
asText = TRUE
if(!asText && missing(isURL)) {
isURL <- length(grep("^(http|ftp)://", file, useBytes = TRUE, perl = TRUE))
}
# check whether we are treating the file name as
# a) the XML text itself, or b) as a URL.
# Otherwise, check if the file exists and report an error.
if(asText == FALSE && isURL == FALSE) {
if(file.exists(file) == FALSE)
stop(paste("File", file, "does not exist "))
}
if(!asText && !isURL)
file = path.expand(file)
old = setEntitySubstitution(replaceEntities)
on.exit(setEntitySubstitution(old))
if(!is.logical(xinclude)) {
if(inherits(xinclude, "numeric"))
xinclude = bitlist(xinclude)
else
xinclude = as.logical(xinclude)
}
.oldErrorHandler = setXMLErrorHandler(error)
on.exit(.Call("RS_XML_setStructuredErrorHandler", .oldErrorHandler, PACKAGE = "XML"), add = TRUE)
ans <- .Call("RS_XML_ParseTree", as.character(file), handlers,
as.logical(ignoreBlanks), as.logical(replaceEntities),
as.logical(asText), as.logical(trim),
FALSE, FALSE,
as.logical(isURL), FALSE,
as.logical(useInternalNodes), TRUE, FALSE, FALSE, as.character(encoding),
as.logical(useDotNames), xinclude, error, addFinalizer, options, as.logical(parentFirst), PACKAGE = "XML")
if(!missing(handlers) & !as.logical(asTree))
return(handlers)
if(inherits(ans, "XMLInternalDocument")) {
addDocFinalizer(ans, addFinalizer)
class(ans) = c("HTMLInternalDocument", class(ans))
}
ans
}
#XXXXXX
# This is another version that doesn't seem to release the document. Weird. I can't seem to find
# out who is holding onto it.
myHTMLParse =
function(file, ignoreBlanks = TRUE, handlers = NULL,
replaceEntities = FALSE, asText = inherits(file, "AsIs") || !isURL && grepl("^<", file), # could have a BOM
trim = TRUE,
isURL = is.character(file) && grepl("^(http|ftp)", file),
asTree = FALSE, useInternalNodes = FALSE,
encoding = character(),
useDotNames = length(grep("^\\.", names(handlers))) > 0,
xinclude = FALSE, addFinalizer = TRUE, error = function(...){})
{
doc = xmlTreeParse(file, ignoreBlanks, handlers, replaceEntities, asText, trim, validate = FALSE,
getDTD = FALSE, isURL, asTree, addAttributeNamespaces = FALSE,
useInternalNodes, isSchema = FALSE, fullNamespaceInfo = FALSE,
encoding, useDotNames, xinclude, addFinalizer, error, isHTML = TRUE)
class(doc) = c("HTMLInternalDocument", class(doc)[2])
return(doc)
}
hideParseErrors = function (...) NULL
htmlTreeParse = xmlTreeParse
formals(htmlTreeParse)$error = as.name("htmlErrorHandler") # as.name("hideParseErrors")
formals(htmlTreeParse)$isHTML = TRUE
htmlParse = htmlTreeParse
formals(htmlParse)$useInternalNodes = TRUE
parseURI =
function(uri)
{
if(is.na(uri))
return(structure(as.character(uri), class = "URI"))
u = .Call("R_parseURI", as.character(uri), PACKAGE = "XML")
if(u$port == 0)
u$port = as.integer(NA)
class(u) = "URI"
u
}
setOldClass("URI")
setOldClass("URL")
setAs("URI", "character",
function(from) {
if(from$scheme == "")
sprintf("%s%s%s",
from["path"],
if(from[["query"]] != "") sprintf("?%s", from[["query"]]) else "",
if(from[["fragment"]] != "") sprintf("#%s", from[["fragment"]]) else "" )
else
sprintf("%s://%s%s%s%s%s%s%s",
from[["scheme"]],
from[["user"]],
if(from[["user"]] != "") "@" else "",
from[["server"]],
if(!is.na(from[["port"]])) sprintf(":%d", as.integer(from[["port"]])) else "",
from["path"],
if(from[["query"]] != "") sprintf("?%s", from[["query"]]) else "",
if(from[["fragment"]] != "") sprintf("#%s", from[["fragment"]]) else ""
)
})
XML/R/error.R 0000644 0001751 0000144 00000007325 13607633667 012435 0 ustar hornik users xmlErrorCumulator =
function(class = "XMLParserErrorList", immediate = TRUE)
{
messages = character()
function(msg, ...) {
# curently discards all the extra information.
if(length(grep("\\\n$", msg)) == 0)
paste(msg, "\n", sep = "")
if(immediate)
cat(msg)
if(length(msg) == 0) {
# collapse into string. Probably want to leave as separate elements of a character vector.
# Make into real objects with the ... information.
e = simpleError(paste(1:length(messages), messages, sep = ": ",collapse = ""))
class(e) = c(class, class(e))
stop(e)
}
messages <<- c(messages, msg)
}
}
xmlStop =
#
# Never used anymore.
# Related to the non-structed error handling.
function(msg, class = "XMLParserError")
{
err = simpleError(msg)
class(err) = c(class , class(err))
stop(err)
}
makeXMLError =
function(msg, code, domain, line, col, level, filename, class = "XMLError")
{
err = simpleError(msg)
err$code = getEnumValue(code, xmlParserErrors)
err$domain = getEnumValue(domain, xmlErrorDomain)
err$line = line
err$col = col
err$level = getEnumValue(level, xmlErrorLevel)
err$filename = filename
class(err) = c(class, class(err))
err
}
htmlErrorHandler =
function(msg, code, domain, line, col, level, filename, class = "XMLError")
{
e = makeXMLError(msg, code, domain, line, col, level, filename, class)
dom = names(e$domain)
class(e) = c(names(e$code),
sprintf("%s_Error", gsub("_FROM_", "_", dom)),
class(e))
if(e$code == xmlParserErrors["XML_IO_LOAD_ERROR"])
stop(e)
}
xmlStructuredStop =
function(msg, code, domain, line, col, level, filename, class = "XMLError")
{
err = makeXMLError(msg, code, domain, line, col, level, filename, class)
stop(err)
}
xmlErrorFun =
function()
{
errors = list()
h = function(msg, code, domain, line, col, level, filename) {
if(length(msg) == 0)
return(TRUE)
err = list(msg = msg, code = code,
domain = domain, line = line,
col = col, level = level, filename = filename)
err = fixXMLError(err)
errors[[length(errors) + 1]] <<- err
}
structure(list(handler = h, errors = function() structure(errors, class = "XMLStructuredErrorList"), reset = function() errors <<- list),
class = "XMLStructuredErrorCumulator")
}
setOldClass("XMLStructuredErrorList")
print.XMLStructuredErrorList =
function(x, ...) {
if(length(x) == 0)
print(NULL)
else
print(t(sapply(x, function(x) unlist(x[c("line", "msg")]))))
}
getXMLErrors=
#
# This attempts to read the specified file using the function given in parse
# and then returns a list of the errors in the document.
# This a somewhat convenient mechanism for fixing up, e.g., malformed HTML
# pages or other XML documents.
function(filename, parse = xmlParse, ...)
{
f = xmlErrorFun()
opts = options()
options(error = NULL)
on.exit(options(opts))
tryCatch(parse(filename, ..., error = f$handler), error = function(e){})
f$errors()
}
# Low level error handler
setXMLErrorHandler =
function(fun)
{
prev = .Call("RS_XML_getStructuredErrorHandler", PACKAGE = "XML")
sym = getNativeSymbolInfo("R_xmlStructuredErrorHandler", "XML")$address
.Call("RS_XML_setStructuredErrorHandler", list(fun, sym), PACKAGE = "XML")
prev
}
fixXMLError =
function(err)
{
err$domain = getEnumValue(err$domain, xmlErrorDomain)
err$code = getEnumValue(err$code, xmlParserErrors)
err$level = getEnumValue(err$level, xmlErrorLevel)
class(err) = "XMLError"
err
}
getEnumValue =
function(value, defs)
{
# might use for the class.
name = substitute(defs)
i = which(value == defs)
defs[i]
}
XML/R/xmlIncludes.R 0000644 0001751 0000144 00000004440 14405636156 013557 0 ustar hornik users xmlXIncludes =
#
# This is similar to getXIncludes() but returns the hierarchical structure
# if desired.
#
function(filename, recursive = TRUE,
omitPattern = "\\.(js|html?|txt|R|c)$",
namespace = c(xi = "https://www.w3.org/2003/XInclude"),
addNames = TRUE,
clean = NULL, ignoreTextParse = FALSE)
{
doc = xmlParse(filename, xinclude = FALSE)
#if(filename == "./XPath/xpathApplyFunctionTable.xml") browser()
if(missing(namespace)) {
ns = xmlNamespaceDefinitions(doc, simplify = TRUE)
if("https://www.w3.org/2001/XInclude" %in% ns)
namespace = c(xi = "https://www.w3.org/2001/XInclude")
}
nodes = getNodeSet(doc, "//xi:include[not(ancestor::ignore)]", namespaces = namespace)
files = lapply(nodes, xmlGetAttr, "href")
nonRecursive = as.logical(sapply(nodes, xmlGetAttr, "parse", "") == "text")
# get rid of duplicates. These arise from xpointer includes ofparts of the document.
d = duplicated(files)
files = files[!d]
nodes = nodes[!d]
nonRecursive = nonRecursive[!d]
if(ignoreTextParse) {
files = files[!nonRecursive]
nonRecursive = rep(FALSE, length(files))
}
files = doClean(files, clean)
if(length(omitPattern))
nonRecursive = grepl(omitPattern, unlist(files)) | nonRecursive
if(recursive) {
ans = files
ans[!nonRecursive] = lapply(files[!nonRecursive],
function(x) {
u = getRelativeURL(x, filename)
u = gsub("#.*$", "", u)
xmlXIncludes(u, recursive = TRUE, addNames = addNames, clean = clean, ignoreTextParse = ignoreTextParse)
})
if(addNames)
names(ans) = files
if(length(ans) == 0 || ans == "")
ans = list()
files = ans
# for D3 output. See RD3Device on github.com/duncantl/RD3Device.
files = lapply(files, function(x) if(is.character(x)) list(name = x) else x)
} else
files = unlist(files)
list(name = doClean(filename, clean), children = files)
}
doClean =
function(txt, clean)
{
if(!is.null(clean)) {
if(is.function(clean))
txt = clean(txt)
else if(is.character(clean))
txt = gsub(clean[1], clean[2], txt)
}
txt
}
XML/R/applyDOM.R 0000644 0001751 0000144 00000000152 13607633666 012757 0 ustar hornik users xmlDOMApply <-
function(dom, func)
{
.Call("RS_XML_RecursiveApply", dom, func, NULL, PACKAGE = "XML")
}
XML/R/hashTree.R 0000644 0001751 0000144 00000023576 13607633702 013043 0 ustar hornik users #
# This is an experiment to see if a simple hash of the values
# is faster.
#
# Basically, we keep the parents, children and nodes
# each as hash tables not a list. Otherwise, this resembles
# a flat tree
#
# The notion is that we have a collection of nodes
# and a collection of .parents and .children
#
# Each element in .parents is assigned to the name of the node
# whose parent is being stored. The value is the identifier for the
# parent node. The top node has no entry in this collection.
#
# The .children environment maintains a collection of entries
# indexed by the identifier of the relevant node.
# The value is a character vector containing the identifiers of the
# nodes which are children of this node.
#
xmlHashTree =
#
# Currently ignore the nodes, parents and children.
#
function(nodes = list(), parents = character(), children = list(),
env = new.env(TRUE, parent = emptyenv()))
{
# function to generate a new node identifier. Can be given the
# proposed name and will then make one up if that conflicts with another
# identifier.
.count = 0
# ability to be able to refer to this tree itself.
# Not used here since the functions don't make use of the tt environment implicitly.
# assign(".this", env, env)
# We will store the children and parents as regular entries in these hash tables.
env$.children = .children = new.env(TRUE)
env$.parents = .parents = new.env(TRUE)
#XXX we can do without this and make it a regular function
# but we need to deal with XMLFlatListTree slightly different.
f = function(suggestion = "") {
# the check to see if suggestion is a name in env is very expensive.
if(suggestion == "" || exists(suggestion, env, inherits = FALSE))
as.character(.count + 1) # can use length(tt)
else
suggestion
}
assign(".nodeIdGenerator", f, env)
addNode =
# This adds each new node.
function(node, parent = character(), ..., attrs = NULL, namespace = NULL,
namespaceDefinitions = character(),
.children = list(...),
cdata = FALSE,
suppressNamespaceWarning = getOption('suppressXMLNamespaceWarning', FALSE))
{
if(is.character(node))
node = xmlNode(node, attrs = attrs, namespace = namespace, namespaceDefinitions = namespaceDefinitions)
.kids = .children
.children = .this$.children
node = asXMLTreeNode(node, .this, className = "XMLHashTreeNode")
id = node$id
assign(id, node, env)
.count <<- .count + 1
# if no parent,
if(!inherits(parent, "XMLNode") && (!is.environment(parent) && length(parent) == 0) || identical(parent, ""))
return(node)
if(inherits(parent, "XMLHashTreeNode"))
parent = parent$id
if(length(parent)) {
assign(id, parent, envir = .parents)
if(exists(parent, .children, inherits = FALSE))
tmp = c(get(parent, .children), id)
else
tmp = id
assign(parent, tmp, .children)
}
return(node)
}
env$.addNode <- addNode
# Create a .nodes vector with the names
# of the node. And then makes a
#
.tidy = function() {
idx <- idx - 1
length(nodeSet) <- idx
length(nodeNames) <- idx
names(nodeSet) <- nodeNames
.nodes <<- nodeSet
idx
}
# environment(env$.tidy) <- env
.this = structure(env, class = oldClass("XMLHashTree"))
.this
}
# Example of looping over all elements
# table(unlist(eapply(tree, xmlName)))
getDescendants =
#
# This is trying to avoid recursion and use iteration.
#
function(id, tree, kids = tree$.children)
{
# if no kids, then return empty list
if(!exists(id, kids))
return(character())
ans = character()
tmp = get(id, kids)
hasKids = objects(kids)
while( length( tmp ) > 0) {
ans = c(ans, tmp)
tmp = tmp[ tmp %in% hasKids ]
k = get(tmp[1], kids)
tmp = c(tmp[-1], k)
}
ans
}
getDescendants =
# Simple mechanism, for xmlHashTree trees.
# This is recursive.
function(id, tree, kids = tree$.children)
{
if(inherits(id, "XMLHashTreeNode")) {
if(missing(tree))
tree = id$env
id = id$id
}
if(!exists(id, kids))
return(character())
ans = get(id, kids)
c(ans, unlist(lapply(ans, getDescendants, tree, kids)))
#Debugging
# names(ans) = sapply(ans, function(i) xmlName(get(i, tree)))
}
subtree = copyXMLHashSubTree =
function(node)
{
# find all the nodes below this node, i.e. in the subtree
tree = node$env
ids = getDescendants(node$id, tree, tree$.children)
newTree = xmlHashTree()
# Now copy the parent & children information to the new tree
# and also the modified nodes. The only modification necessary
# is to set the env field of the original node to the new tree.
sapply(c(node$id, ids), function(id) {
n = get(id, tree)
n$env = newTree
assign(id, n, newTree)
if(exists(id, tree$.children))
assign(id, get(id, tree$.children), newTree$.children)
if(exists(id, tree$.parents))
assign(id, get(id, tree$.parents), newTree$.parents)
})
remove(list = node$id, envir = newTree$.parents)
newTree
}
xmlNamespaceDefinitions.XMLAbstractDocument =
function(x, addNames = TRUE, recursive = FALSE, simplify = FALSE, ...)
{
xmlNamespaces(as(x, "XMLAbstractNode"))
}
setAs("XMLAbstractDocument", "XMLAbstractNode",
function(from)
xmlRoot(from))
setAs("XMLHashTreeNode", "XMLHashTree",
function(from)
from$env
)
"$.XMLHashTree" =
function(x, name)
get(name, x, inherits = FALSE)
setMethod("xmlParent", "XMLHashTreeNode",
# To get the parent of the node 'obj', we have to look in the .parents object
# for the variable with obj's node identifier and then get the corresponding
# value which is the identifier of the parent.
function(x, ...)
{
p = get(".parents", x$env)
idx = exists(x$id, p, inherits = FALSE)
if(!idx)
return(NULL)
get(get(x$id, p), x$env)
} )
xmlChildren.XMLHashTreeNode =
#
# For a given node 'obj', we have to use its id to find the entry
# in the .children hash table and then the resulting entry is a character
# vector giving the ids of the child nodes of obj. So we have to resolve those
# children id's back in the hash table for the actual nodes.
function(x, addNames = TRUE, ...)
{
e = x$env
kids = get(".children", e)
if(exists(x$id, kids, inherits = FALSE)) {
ids = get(x$id, kids, inherits = FALSE)
nodes = lapply(ids, get, e, inherits = FALSE)
names(nodes) = sapply(nodes, xmlName)
nodes
} else
list()
}
if(useS4)
setMethod("xmlChildren", "XMLHashTreeNode", xmlChildren.XMLHashTreeNode)
xmlSize.XMLHashTreeNode =
function(obj)
{
length(xmlChildren(obj))
}
xmlSize.XMLHashTree =
function(obj)
{
# 3 is the number of entries with a . prefix that we put there
# for our own implementation purposes
# We could calculate this as
# length(grep("^.", objects(obj, all = TRUE))
length(obj) - 3
}
#??? Currently overridden below
xmlRoot.XMLHashTree =
function(x, skip = TRUE, ...)
{
id = setdiff(objects(x), objects(x[[".parents"]]))
get(id, x)
}
"[[.XMLHashTreeNode" =
function(x, ..., copy = FALSE, exact = TRUE)
{
# ans = NextMethod("[[")
ans = xmlChildren(x)[[...]]
if(copy)
xmlRoot(subtree(ans))
else
ans
}
addNode =
function(node, parent, to, ...)
{
UseMethod("addNode", to)
}
addNode.XMLHashTree =
function(node, parent = character(), to, ...)
{
to[[".addNode"]](node, parent, ...)
}
xmlRoot.XMLHashTree =
#
# This can return multiple roots
#
# Find all the identities of the nodes for which there is no
# corresponding entry n the .parents
#
#
# If skip is TRUE, discard comment nodes. Leave PI nodes, etc.
#
# If all is TRUE, return a list() with all the top-level nodes.
#
function(x, skip = TRUE, all = FALSE, ...)
{
parents = get(".parents", x, inherits = FALSE)
tops = objects(x)[ is.na(match(objects(x), objects(parents)))]
if(length(tops) == 0)
return(NULL)
ans = mget(tops, x)
if(skip)
ans = ans[!sapply(ans, inherits, c("XMLCommentNode"))] #XXX names of XML hash tree nodes for comment, processing instruction, text node, etc.
if(all)
return(ans)
ans[[1]]
}
getSibling =
# Access the next field in the xmlNodePtr object.
# not exported.
function(node, after = TRUE, ...)
UseMethod("getSibling")
getSibling.XMLHashTreeNode =
function(node, after = TRUE, ...)
{
.this = node$env
parent = xmlParent(node)
if(!is.null(parent)) {
kids = xmlChildren(parent)
} else
kids = xmlRoot(.this, skip = FALSE, all = TRUE)
i = match(node$id, sapply(kids, function(x) x$id))
if(is.na(i))
stop("shouldn't happen")
if(after) {
if(i < length(kids))
kids[[i+1]]
else
NULL
} else {
if(i > 1)
kids[[i-1]]
else
NULL
}
}
print.XMLHashTree =
function(x, ...)
{
print(xmlRoot(x), ...)
}
xmlElementsByTagName.XMLHashTree =
#
# non-recursive version only at present
#
function(el, name, recursive = FALSE)
{
kids = xmlChildren(el)
if(!recursive)
return(kids [ sapply(kids, xmlName) == name ])
}
convertToHashTree =
function(from)
{
xx = xmlHashTree()
ans = .Call("R_convertDOMToHashTree", from, xx, xx$.children, xx$.parents, PACKAGE = "XML")
docName(xx) = docName(from)
xx
}
setAs("XMLInternalDocument", "XMLHashTree",
function(from) {
convertToHashTree(xmlRoot(from, skip = FALSE))
})
setAs("XMLInternalNode", "XMLHashTree",
function(from) {
ans = convertToHashTree(from)
docName(ans) <- docName(from)
ans
})
docName.XMLHashTree =
function(doc)
{
if(exists(".doc", doc))
doc$.doc
else
as.character(NA)
}
setMethod("docName", "XMLHashTree", docName.XMLHashTree)
XML/R/catalog.R 0000644 0001751 0000144 00000004132 13607633665 012705 0 ustar hornik users xmlInitializeCatalog =
function()
.C("R_xmlInitializeCatalog", PACKAGE = "XML")
catalogResolve =
function(id, type = "uri", asIs = FALSE, debug = FALSE)
{
xmlInitializeCatalog()
type = rep(type, length = length(id))
types = c("uri", "public", "system")
i = pmatch(tolower(type), types, duplicates.ok = TRUE)
if(any(is.na(i)))
stop("don't recognize type. Must be one of ", paste(types, collapse = ", "))
ans = .Call("R_xmlCatalogResolve", as.character(id), i, as.logical(debug), PACKAGE = "XML")
if(asIs)
ans[is.na(ans)] = id[is.na(ans)]
ans
}
catalogLoad =
function(fileNames)
{
.Call("RS_XML_loadCatalog", path.expand(fileNames), PACKAGE = "XML")
}
catalogClearTable =
function()
{
.Call("RS_XML_clearCatalog", PACKAGE = "XML")
}
XMLCatalogTypes = c("public", "system", "rewriteSystem", "rewriteURI", "uri", "delegateSystem", "delegatePublic", "delegateURI", "nextCatalog", "catalog")
catalogAdd =
function(orig, replace, type = "rewriteURI")
{
if(missing(replace)) {
replace = orig
orig = names(replace)
}
else
length(replace) = length(orig)
idx = pmatch(type, XMLCatalogTypes)
if(any(is.na(idx))) {
stop("unrecognized XML catalog type(s) ", type[is.na(idx)], ". Must be one of ",
paste("'", XMLCatalogTypes, "'", sep = "", collapse = ", "))
}
type = XMLCatalogTypes[idx]
type = rep(as.character(type), length = length(orig))
xmlInitializeCatalog()
.Call("RS_XML_catalogAdd", as.character(orig), as.character(replace), as.character(type), PACKAGE = "XML")
}
catalogDump =
#
# Get a snapshot of the current contents of the global catalog table
# parsing it or writing it to a file for further use
# If asText = TRUE and you don't specify a value for fileName,
# it returns the XML content as a string for easier viewing.
function(fileName = tempfile(), asText = TRUE)
{
xmlInitializeCatalog()
ans = .Call("RS_XML_catalogDump", as.character(fileName), PACKAGE = "XML")
if(missing(fileName)) {
ans = xmlParse(fileName)
if(asText)
ans = saveXML(ans)
unlink(fileName)
}
ans
}
XML/R/xmlHandler.R 0000644 0001751 0000144 00000001522 13607633674 013371 0 ustar hornik users xmlHandler <-
function() {
data <- list()
startElement <- function(name, atts,...) {
if(is.null(atts))
atts <- list()
data[[name]] <<- atts
}
text <- function(x,...) {
cat("MyText:",x,"\n")
}
comment <- function(x,...) {
cat("comment", x,"\n")
}
externalEntity <- function(ctxt, baseURI, sysId, publicId,...) {
cat("externalEntity", ctxt, baseURI, sysId, publicId,"\n")
}
entityDeclaration <- function(name, baseURI, sysId, publicId,notation,...) {
cat("externalEntity", name, baseURI, sysId, publicId, notation,"\n")
}
foo <- function(x,attrs,...) { cat("In foo\n")}
return(list(startElement=startElement, getData=function() {data},
comment=comment, externalEntity=externalEntity,
entityDeclaration=entityDeclaration,
text=text, foo=foo))
}
XML/R/parseDTD.R 0000644 0001751 0000144 00000001332 13607633674 012740 0 ustar hornik users parseDTD <-
function(extId, asText = FALSE, name = "", isURL = FALSE, error = xmlErrorCumulator())
{
extId <- as.character(extId)
if(!asText && missing(isURL)) {
isURL <- length(grep("(http|ftp)://", extId, useBytes = TRUE)) > 0
}
if(missing(name))
name <- extId
.oldErrorHandler = setXMLErrorHandler(error)
on.exit(.Call("RS_XML_setStructuredErrorHandler", .oldErrorHandler, PACKAGE = "XML"), add = TRUE)
if(asText) {
f <- gsub("\\", "/", tempfile(), fixed=TRUE)
cat(extId, "\n", file = f)
extId = f
asText = FALSE
}
.Call("RS_XML_getDTD", as.character(name), as.character(extId),
as.logical(asText), as.logical(isURL), error, PACKAGE = "XML")
}
XML/R/compare.R 0000644 0001751 0000144 00000001511 13607633666 012720 0 ustar hornik users
# For comparing if two XML documents are "similar" whatever that means.
# We look at the distribution of node names
summary.XMLInternalDocument =
function(object, ...)
{
counts = sort(table(xpathSApply(object, "//*", xmlName, ...)), decreasing = TRUE)
list(nameCounts = counts,
numNodes = sum(counts))
}
compareXMLDocs =
function(a, b, ...)
{
sa = summary(a, ...)
sb = summary(b, ...)
inAOnly = setdiff(names(sa$nameCounts), names(sb$nameCounts))
inBOnly = setdiff(names(sb$nameCounts), names(sa$nameCounts))
common.ids = intersect(names(sa$nameCounts), names(sb$nameCounts)) # != sb$nameCounts[names(sa$nameCounts)
diffs = sa$nameCounts[common.ids] - sb$nameCounts[common.ids]
diffs = diffs[diffs != 0]
list(inA = sa$nameCounts[inAOnly], inB = sb$nameCounts[inBOnly], countDiffs = diffs)
#all.equal(sa, sb)
}
XML/R/encoding.R 0000644 0001751 0000144 00000001760 13607633665 013065 0 ustar hornik users CE_NATIVE = 0L
CE_UTF8 = 1L
CE_LATIN1 = 2L
# Map an encoding or document's encoding to the corresponding R internal enum value
setGeneric("getEncodingREnum",
function(doc, ...)
standardGeneric("getEncodingREnum"))
setMethod("getEncodingREnum", "XMLInternalDocument",
function(doc, ...)
getEncodingREnum( getEncoding(doc) ))
setMethod("getEncodingREnum", "XMLInternalElementNode", # was XMLInternalElement, but no such class?
function(doc, ...)
getEncodingREnum( as(doc, "XMLInternalDocument") ))
setMethod("getEncodingREnum", "character",
function(doc, ...) {
if(length(doc) == 0 || is.na(doc))
return(CE_NATIVE)
str = tolower(doc)
if(any(str == c("utf8", "utf-8")))
CE_UTF8
else if(any(str == c("latin1", "iso-8859-1")))
CE_LATIN1
else
CE_NATIVE # or NA?
})
XML/cleanup 0000755 0001751 0000144 00000001153 14636531034 012312 0 ustar hornik users #!/bin/sh
# This cleans up after the auxillary files that were created when installing
# the XML package. The R INSTALL command takes care of the others.
#
# This is a slightly modified version of Torsten Hothorn's original contribution
# to work using the Bourne shell and for non-GNU versions of test.
echo " Cleaning up after installing the XML package"
for f in config.log config.status config.cache ; do
if test -w $f ; then
rm -f $f
fi
done
for f in src/Makevars R/supports.R inst/scripts/RSXML.csh inst/scripts/RSXML.bsh src/Makevars ; do
if test -w $f ; then
rm -f $f
fi
done
exit 0
XML/src/ 0000755 0001751 0000144 00000000000 14553462406 011530 5 ustar hornik users XML/src/XMLHashTree.c 0000644 0001751 0000144 00000017123 14343054535 013761 0 ustar hornik users /*
This set of functions is designed to map an internal
XML DOM tree to an xmlHashTree.
An xmlHashTree() is an environment, and so is mutable.
It contains the following elements:
1) nodes each with a unique identifier, e.g. 1, 2, 3.
which are used to index them in the elements that
specify the structure.
2) .children - an environment
3) .parents - an environment
4) top - the root node
Functions
.addNode and .nodeIdGenerator
If we are not going to be adding to this tree,
we don't need these functions.
As we move through the internal tree, we can construct
a node and then the character vector of children for
that node and assign it to the .children environment
Suppose we have a tree like
A
/ \
B C
/ | \
D E F
|
G
We start giving the nodes names by number, i.e. 1, 2, 3,....
To create A, we make the node object as a list
with name (A), attributes (NULL), namespace (""), [children],
id (1) and env.
The children are
2 and 3
The parents
*/
#include
#include
#include "DocParse.h"
#include "Utils.h"
static const char * const nodeElementNames[] = {
"name", "attributes", "namespace", "children", "id", "env"
};
/*
Identifier for the node. Could be anything, so we use the pointer address to ensure
uniqueness. Ignore the next definition in the comment!
#define SET_NODE_NAME(x, id) sprintf(x, "%d", id)
*/
#define SET_NODE_NAME(x, id, node) snprintf(x, 20, "%p", (void *)node)
SEXP
makeHashNode(xmlNodePtr node, char *buf, SEXP env, R_XMLSettings *parserSettings)
{
SEXP ans, names, tmp;
int i = 0, numEls = sizeof(nodeElementNames)/sizeof(nodeElementNames[0]);
DECL_ENCODING_FROM_NODE(node)
int hasValue = node->type == XML_TEXT_NODE || node->type == XML_COMMENT_NODE
|| node->type == XML_CDATA_SECTION_NODE || node->type == XML_PI_NODE;
if(hasValue)
numEls++;
if(node->nsDef)
numEls++;
PROTECT(ans = NEW_LIST(numEls));
PROTECT(tmp = mkString(node->name ? XMLCHAR_TO_CHAR(node->name) : ""));
if(node->ns)
SET_NAMES(tmp, mkString((const char *)node->ns->prefix));
SET_VECTOR_ELT(ans, i++, tmp);
UNPROTECT(1);
SET_VECTOR_ELT(ans, i++, RS_XML(AttributeList)(node, parserSettings));
SET_VECTOR_ELT(ans, i++, ScalarString(ENC_COPY_TO_USER_STRING(node->ns && node->ns->prefix ? node->ns->prefix: (const xmlChar *)"")));
/* skip the children */
i = 4;
SET_VECTOR_ELT(ans, i++, mkString(buf));
SET_VECTOR_ELT(ans, i++, env);
if(hasValue)
SET_VECTOR_ELT(ans, i++, mkString((const char *)node->content));
if(node->nsDef)
SET_VECTOR_ELT(ans, i++, processNamespaceDefinitions(node->nsDef, node, parserSettings));
PROTECT(names = NEW_CHARACTER(numEls));
for(i = 0; i < sizeof(nodeElementNames)/sizeof(nodeElementNames[0]); i++)
SET_STRING_ELT(names, i, ENC_COPY_TO_USER_STRING(nodeElementNames[i]));
if(hasValue)
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("value"));
if(node->nsDef)
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("namespaceDefinitions"));
SET_NAMES(ans, names);
UNPROTECT(1);
/* The class of the node */
PROTECT(names = NEW_CHARACTER( node->type == XML_ELEMENT_NODE ? 2 : 3));
i = 0;
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("XMLHashTreeNode"));
if(node->type == XML_TEXT_NODE)
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("XMLTextNode"));
else if(node->type == XML_COMMENT_NODE)
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("XMLCommentNode"));
else if(node->type == XML_CDATA_SECTION_NODE)
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("XMLCDataNode"));
else if(node->type == XML_PI_NODE)
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("XMLPINode"));
SET_STRING_ELT(names, i++, COPY_TO_USER_STRING("XMLNode"));
SET_CLASS(ans, names);
UNPROTECT(2);
return(ans);
}
unsigned int
countChildNodes(xmlNodePtr root, unsigned int *ctr)
{
xmlNodePtr node;
for(node = root->children; node; node = node->next) {
if(node->type == XML_XINCLUDE_START)
countChildNodes(node, ctr);
else if(node->type != XML_XINCLUDE_END)
(*ctr)++;
}
return(*ctr);
}
void
collectChildNodes(xmlNodePtr root, unsigned int *ctr, SEXP kids)
{
xmlNodePtr node;
for(node = root->children; node; node = node->next) {
if(node->type == XML_XINCLUDE_END)
continue;
if(node->type == XML_XINCLUDE_START)
collectChildNodes(node, ctr, kids);
else {
char buf[20];
SET_NODE_NAME(buf, *ctr + 1, node);
SET_STRING_ELT(kids, *ctr, mkChar(buf));
(*ctr)++;
}
}
}
/*
This is the recursive function that process a node and then its children.
It builds the node (via makeHashNode) and then adds an entry for the
children
and
parent
These provide the structure for the tree.
*/
void
processNode(xmlNodePtr root, xmlNodePtr parent, unsigned int *ctr, int parentId, char *id, SEXP env, SEXP childrenEnv, SEXP parentEnv, R_XMLSettings *parserSettings)
{
// int i;
xmlNodePtr node;
SEXP rnode, kids;
unsigned int curId = *ctr;
char buf[20];
SET_NODE_NAME(id, curId, root);
if(root->type != XML_XINCLUDE_START && root->type != XML_XINCLUDE_END) {
rnode = PROTECT(makeHashNode(root, id, env, parserSettings));
defineVar(Rf_install(id), rnode, env);
UNPROTECT(1);
if(root->parent && root->parent->type != XML_DOCUMENT_NODE && root->parent->type != XML_HTML_DOCUMENT_NODE) {
/* Put an entry in the .parents environment for this current id with the single value
which is the value of the parentId as a string, equivalent of
assign(curId, parentId, parentEnv)
*/
SET_NODE_NAME(id, curId, root);
SET_NODE_NAME(buf, parentId, parent);
defineVar(Rf_install(id), PROTECT(mkString(buf)), parentEnv);
UNPROTECT(1);
}
if(root->children) {
/* We have to deal with */
unsigned int i = 0;
countChildNodes(root, &i);
PROTECT(kids = NEW_CHARACTER(i));
i = 0; collectChildNodes(root, &i, kids);
defineVar(Rf_install(id), kids, childrenEnv);
UNPROTECT(1);
}
(*ctr)++;
}
if(root->type != XML_XINCLUDE_END) {
/* Discard XML_INCLUDE_END nodes, but for XML_INCLUDE_START, we need to specify a different parent,
i.e., the parent of the XML_INCLUDE_START node so that it will act as the parent of the
included nodes.
*/
xmlNodePtr parent;
parent = root->type == XML_XINCLUDE_START ? root->parent : root;
for(node = root->children; node; node = node->next)
processNode(node, parent, ctr, curId, id, env, childrenEnv, parentEnv, parserSettings);
}
}
/*
This is the top-level C entry point for starting the conversion of the node.
*/
unsigned int
convertDOMToHashTree(xmlNodePtr root, SEXP env, SEXP childrenEnv, SEXP parentEnv, R_XMLSettings *parserSettings)
{
// SEXP rnode;
unsigned int ctr = 0;
xmlNodePtr tmp;
char id[20];
memset(id, '\0', sizeof(id));
for(tmp = root; tmp; tmp = tmp->next)
processNode(tmp, (xmlNodePtr) NULL, &ctr, -1, id, env, childrenEnv, parentEnv, parserSettings);
return(ctr);
}
/*
This is the R entry point for the conversion of the node and its subnode.
*/
SEXP
R_convertDOMToHashTree(SEXP rnode, SEXP env, SEXP childrenEnv, SEXP parentEnv)
{
unsigned int ctr;
xmlNodePtr node;
R_XMLSettings parserSettings;
parserSettings.addAttributeNamespaces = 0;
parserSettings.converters = NULL_USER_OBJECT;
node = (xmlNodePtr) R_ExternalPtrAddr(rnode);
ctr = convertDOMToHashTree(node, env, childrenEnv, parentEnv, &parserSettings);
return(ScalarInteger(ctr));
}
XML/src/RSDTD.c 0000644 0001751 0000144 00000055500 14106741723 012555 0 ustar hornik users /**
This file defines the top-level entry routine called from R and S to parse and convert
a DTD into a user-level object.
Most of the routines are support routines. We leave them as global symbols (as opposed to static) so that others might be
able to utilize them. Some are called from the other files (DocParse, specifically).
* See Copyright for the license status of this software.
*/
#include "RSDTD.h"
#ifdef USE_S
extern char *strdup(const char *);
#endif
#include "Utils.h" /* for SET_CLASS_NAME */
#include
/* For reading DTDs directly from text, not files.
Copied directly from parser.c in the libxml(-1.7.3) library.
*/
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
#define INPUT_CHUNK 250
#define CUR (ctxt->token ? ctxt->token : (*ctxt->input->cur))
#ifdef OLD_SKIP_BLANKS
#define SKIP_BLANKS \
do { \
while (IS_BLANK(CUR)) NEXT; \
if (*ctxt->input->cur == '%') xmlParserHandlePEReference(ctxt); \
if (*ctxt->input->cur == '&') xmlParserHandleReference(ctxt); \
} while (IS_BLANK(CUR));
#define NEXT { \
if (ctxt->token != 0) ctxt->token = 0; \
else { \
if ((*ctxt->input->cur == 0) && \
(xmlParserInputGrow(ctxt->input, INPUT_CHUNK) <= 0)) { \
xmlPopInput(ctxt); \
} else { \
if (*(ctxt->input->cur) == '\n') { \
ctxt->input->line++; ctxt->input->col = 1; \
} else ctxt->input->col++; \
ctxt->input->cur++; \
if (*ctxt->input->cur == 0) \
xmlParserInputGrow(ctxt->input, INPUT_CHUNK); \
} \
if (*ctxt->input->cur == '%') xmlParserHandlePEReference(ctxt); \
if (*ctxt->input->cur == '&') xmlParserHandleReference(ctxt); \
}}
#else
#define SKIP_BLANKS xmlSkipBlankChars(ctxt)
#define NEXT xmlNextChar(ctxt)
#endif
/* end temporary. */
/* Macro that sets the name of an enumerated value by indexing into an array of
names based on the value being represented.
*/
#define SET_ENUM_NAME(names, which, obj) RS_XML_SetNames(1, RS_XML_##names+which-1,obj);
enum {DTD_ELEMENTS_SLOT, DTD_ENTITIES_SLOT, DTD_NUM_SLOTS};
const char *RS_XML(DtdNames)[] = {"elements", "entities"};
/**
Top-level entry point for reading the DTD.
dtdFileName - name of the DTD.
externalId - file identfying the DTD from which its contents are read.
*/
USER_OBJECT_
RS_XML(getDTD)(USER_OBJECT_ dtdFileName, USER_OBJECT_ externalId,
USER_OBJECT_ asText, USER_OBJECT_ isURL, USER_OBJECT_ errorFun)
{
USER_OBJECT_ ans;
const char * dtdName = strdup(CHAR_DEREF(STRING_ELT(dtdFileName, 0)));
const char * extId = strdup(CHAR_DEREF(STRING_ELT(externalId, 0)));
int localAsText = LOGICAL_DATA(asText)[0];
xmlParserCtxtPtr ctxt;
xmlDtdPtr dtd;
if(localAsText) {
ctxt = xmlCreateDocParserCtxt((xmlChar*) extId);
} else {
if(LOGICAL_DATA(isURL)[0] == 0) {
struct stat tmp_stat;
if(extId == NULL || stat(extId, &tmp_stat) < 0) {
Rf_error("Can't find file %s", extId);
}
}
ctxt = xmlCreateFileParserCtxt(extId); /* from parser.c xmlSAXParseFile */
}
if(ctxt == NULL) {
Rf_error("error creating XML parser for `%s'", extId);
}
ctxt->validate = 1;
#ifdef RS_XML_SET_STRUCTURED_ERROR /* Done in R code now. */
xmlSetStructuredErrorFunc(errorFun == NULL_USER_OBJECT ? NULL : errorFun, R_xmlStructuredErrorHandler);
#endif
if(ctxt->myDoc == NULL)
ctxt->myDoc = xmlNewDoc(BAD_CAST "1.0");
if(localAsText) {
xmlCreateIntSubset(ctxt->myDoc, CHAR_TO_XMLCHAR(dtdName), NULL, NULL);
while(ctxt->input->cur && ctxt->input->cur[0]) {
SKIP_BLANKS;
xmlParseMarkupDecl(ctxt);
}
dtd = ctxt->myDoc->intSubset;
} else {
/* Read the file. */
/* Added for 2.2.12. May need to be conditional for 1.8.9 */
ctxt->sax->internalSubset(ctxt->userData, CHAR_TO_XMLCHAR(dtdName), CHAR_TO_XMLCHAR(extId), CHAR_TO_XMLCHAR(extId));
/* Warnings will ensue about not being in internal subset if we don't go to level 2. */
#ifdef USE_EXTERNAL_SUBSET
ctxt->inSubset = 2;
ctxt->sax->externalSubset(ctxt->userData, CHAR_TO_XMLCHAR(dtdName), CHAR_TO_XMLCHAR(extId), CHAR_TO_XMLCHAR(extId));
ctxt->inSubset = 0;
#endif
dtd = ctxt->myDoc->extSubset;
}
#ifdef RS_XML_SET_STRUCTURED_ERROR
xmlSetStructuredErrorFunc(NULL, NULL);
#endif
if(dtd == NULL) {
if(errorFun != NULL_USER_OBJECT) {
RSXML_structuredStop(errorFun, NULL);
} else
return(stop("DTDParseError", "error parsing %s", dtdName));
Rf_error("error in DTD %s", extId);
}
if(localAsText) {
/* Don't bother with the internal and external split, just do the internal and return it. */
ans = RS_XML(createDTDParts)(dtd, ctxt);
} else
ans = RS_XML(ConstructDTDList)(ctxt->myDoc, 0, ctxt);
return(ans);
}
const char *RS_XML(DtdTypeNames)[] = {"external", "internal"};
/**
Create the representation of the DTD contained in the Document pointer,
using both the internal and external descriptions and returning a list
of the appropriate length. If the external description is empty, then we just
return the description of the internal description. Otherwise, we return a named
list of length 2 containing descriptions of both.
*/
USER_OBJECT_
RS_XML(ConstructDTDList)(xmlDocPtr myDoc, int processInternals, xmlParserCtxtPtr ctxt)
{
USER_OBJECT_ ans, el, klass;
int i;
xmlDtdPtr sets[2];
int num = processInternals ? 2 : 1;
sets[0] = myDoc->extSubset;
if(processInternals) {
sets[1] = myDoc->intSubset;
}
PROTECT(ans = NEW_LIST(num));
for(i = 0; i < num; i++) {
if(sets[i]) {
SET_VECTOR_ELT(ans, i, el= RS_XML(createDTDParts)(sets[i], ctxt));
PROTECT(klass = NEW_CHARACTER(1));
SET_STRING_ELT(klass, 0, mkChar(i==0 ? "ExternalDTD" : "InternalDTD"));
SET_CLASS(el, klass);
UNPROTECT(1);
}
}
RS_XML(SetNames)(num, RS_XML(DtdTypeNames), ans);
UNPROTECT(1);
return(processInternals ? ans : VECTOR_ELT(ans, 0));
}
/**
Process the entities and elements of the DTD, returning a list
of length 2, irrespective if either is empty.
*/
USER_OBJECT_
RS_XML(createDTDParts)(xmlDtdPtr dtd, xmlParserCtxtPtr ctxt)
{
xmlEntitiesTablePtr entities;
xmlElementTable *table;
USER_OBJECT_ ans;
PROTECT(ans = NEW_LIST(DTD_NUM_SLOTS));
table = (xmlElementTable*) dtd->elements;
if(table)
SET_VECTOR_ELT(ans, DTD_ELEMENTS_SLOT, RS_XML(ProcessElements)(table, ctxt));
entities = (xmlEntitiesTablePtr) dtd->entities;
if(entities)
SET_VECTOR_ELT(ans, DTD_ENTITIES_SLOT, RS_XML(ProcessEntities)(entities, ctxt));
RS_XML(SetNames)(DTD_NUM_SLOTS, RS_XML(DtdNames), ans);
UNPROTECT(1);
return(ans);
}
#ifdef LIBXML2
struct ElementTableScanner {
USER_OBJECT_ dtdEls;
USER_OBJECT_ dtdNames;
int counter;
};
#if LIBXML_VERSION >= 20908
# define CONST const
#else
# define CONST
#endif
#ifndef NO_XML_HASH_SCANNER_RETURN
void *RS_xmlElementTableConverter(void *payload, void *data, CONST xmlChar *name);
void* RS_xmlEntityTableConverter(void *payload, void *data, CONST xmlChar *name);
#else
void RS_xmlElementTableConverter(void *payload, void *data, CONST xmlChar *name);
void RS_xmlEntityTableConverter(void *payload, void *data, CONST xmlChar *name);
#endif
#endif
/**
Convert the elements into a named list of objects with each element
representing an element.
*/
USER_OBJECT_
RS_XML(ProcessElements)(xmlElementTablePtr table, xmlParserCtxtPtr ctxt)
{
USER_OBJECT_ dtdEls = NULL_USER_OBJECT;
int n;
#ifdef LIBXML2
n = xmlHashSize(table);
#else
int i;
xmlElementPtr xmlEl;
n = table->nb_elements;
#endif
if(n > 0) {
USER_OBJECT_ dtdNames = NULL_USER_OBJECT;
PROTECT_INDEX ipx;
PROTECT_WITH_INDEX(dtdEls = NEW_LIST(n), &ipx);
PROTECT(dtdNames = NEW_CHARACTER(n));
#ifdef LIBXML2
{
struct ElementTableScanner scanData;
scanData.dtdEls = dtdEls;
scanData.dtdNames = dtdNames;
scanData.counter = 0;
xmlHashScan(table, RS_xmlElementTableConverter, &scanData);
SET_LENGTH(dtdEls, scanData.counter);
REPROTECT(dtdEls, ipx);
SET_LENGTH(dtdNames, scanData.counter);
}
#else
for(i = 0; i < n; i++) {
xmlEl = table->table[i];
SET_VECTOR_ELT(dtdEls, i, RS_XML(createDTDElement)(xmlEl));
SET_STRING_ELT(dtdNames , i, COPY_TO_USER_STRING(xmlEl->name));
}
#endif
SET_NAMES(dtdEls, dtdNames);
UNPROTECT(2);
}
return(dtdEls);
}
#ifdef LIBXML2
/* libxml2 2.4.21 (and perhaps earlier) redefines this to have a return type of void,
rather than void*. Need to figure out if this makes any real difference to the interface
and also when to
*/
#ifndef NO_XML_HASH_SCANNER_RETURN
void*
#else
void
#endif
RS_xmlElementTableConverter(void *payload, void *data, CONST xmlChar *name)
{
struct ElementTableScanner *scanData = (struct ElementTableScanner *)data;
SET_VECTOR_ELT(scanData->dtdEls, scanData->counter, RS_XML(createDTDElement)( payload));
SET_STRING_ELT(scanData->dtdNames, scanData->counter, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(name)));
scanData->counter++;
#ifndef NO_XML_HASH_SCANNER_RETURN
return(payload);
#endif
}
#endif
/**
Process the list of entities and convert them into a named list containing
entity descriptions.
*/
USER_OBJECT_
RS_XML(ProcessEntities)(xmlEntitiesTablePtr table, xmlParserCtxtPtr ctxt)
{
USER_OBJECT_ dtdEls = NULL_USER_OBJECT;
int n;
#ifdef LIBXML2
n = xmlHashSize(table);
#else
xmlEntity *xmlEl;
int i;
n = table->nb_entities;
#endif
if(n > 0) {
USER_OBJECT_ dtdNames;
PROTECT_INDEX ipx;
PROTECT_WITH_INDEX(dtdEls = NEW_LIST(n), &ipx);
PROTECT(dtdNames = NEW_CHARACTER(n));
#ifdef LIBXML2
{
struct ElementTableScanner scanData;
scanData.dtdEls = dtdEls;
scanData.dtdNames = dtdNames;
scanData.counter = 0;
xmlHashScan(table, RS_xmlEntityTableConverter, &scanData);
/* Reset the length to be the actual number rather than the
capacity of the table.
See ProcessElements also.
*/
SET_LENGTH(dtdEls, scanData.counter);
REPROTECT(dtdEls, ipx);
SET_LENGTH(dtdNames, scanData.counter);
}
#else
for(i = 0; i < n; i++) {
xmlEl = table->table +i;
SET_VECTOR_ELT(dtdEls, i, RS_XML(createDTDEntity)(xmlEl));
SET_STRING_ELT(dtdNames, i, COPY_TO_USER_STRING(xmlEl->name));
}
#endif
SET_NAMES(dtdEls, dtdNames);
UNPROTECT(2);
}
return(dtdEls);
}
#ifdef LIBXML2
#ifndef NO_XML_HASH_SCANNER_RETURN
void*
#else
void
#endif
RS_xmlEntityTableConverter(void *payload, void *data, CONST xmlChar *name)
{
struct ElementTableScanner *scanData = (struct ElementTableScanner *)data;
SET_VECTOR_ELT(scanData->dtdEls, scanData->counter, RS_XML(createDTDEntity)( payload));
SET_STRING_ELT(scanData->dtdNames, scanData->counter, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(name)));
scanData->counter++;
#ifndef NO_XML_HASH_SCANNER_RETURN
return(payload);
#endif
}
#endif /* End of LIBXML2 for definint RS_xmlEntityTableConverter */
/**
Convert an entity definition into a user-level object, handling both internal and system entities.
We could have different slots for the two types of entities, but that may make it harder to program.
S3/R classes aren't exactly good with inheritance of slots.
*/
/**
Indices for the slots of the user-level list representing the entity.
*/
enum { DTD_ENTITY_NAME_SLOT, DTD_ENTITY_CONTENT_SLOT, DTD_ENTITY_ORIG_SLOT, DTD_ENTITY_NUM_SLOTS};
/*
Names for the slots of the user-level list representing the entity.
*/
const char *RS_XML(EntityNames)[] = {"name", "value", "original"};
USER_OBJECT_
RS_XML(createDTDEntity)(xmlEntityPtr entity)
{
USER_OBJECT_ ans;
const xmlChar *value;
const char *localClassName;
PROTECT(ans = NEW_LIST(DTD_ENTITY_NUM_SLOTS));
SET_VECTOR_ELT(ans, DTD_ENTITY_NAME_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, DTD_ENTITY_NAME_SLOT), 0, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(entity->name)));
if(entity->content == NULL) {
value = entity->SystemID;
localClassName = "XMLSystemEntity";
} else {
value = entity->content;
localClassName = "XMLEntity";
}
SET_VECTOR_ELT(ans, DTD_ENTITY_CONTENT_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, DTD_ENTITY_CONTENT_SLOT), 0, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(value)));
if(entity->orig) {
SET_VECTOR_ELT(ans, DTD_ENTITY_ORIG_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, DTD_ENTITY_ORIG_SLOT), 0, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(entity->orig)));
}
RS_XML(SetNames)(DTD_ENTITY_NUM_SLOTS, RS_XML(EntityNames), ans);
/* Set the class of the specified object based on whether it is a internal or external entity. */
SET_CLASS_NAME(localClassName, ans);
UNPROTECT(1);
return(ans);
}
enum { DTD_ELEMENT_NAME_SLOT, DTD_ELEMENT_TYPE_SLOT, DTD_ELEMENT_CONTENT_SLOT, DTD_ELEMENT_ATTRIBUTES_SLOT, DTD_ELEMENT_NUM_SLOTS};
const char *RS_XML(ElementNames)[] = {"name", "type", "contents","attributes"};
const char *RS_XML(ElementTypeNames)[] = {"empty", "any", "mixed","element"};
/**
Creates the user-level object representing the definition of an element within a DTD,
including its attribute definitions, its type, name and finally contents.
This is an object of class XMLElementDef.
*/
USER_OBJECT_
RS_XML(createDTDElement)(xmlElementPtr el)
{
USER_OBJECT_ rel;
int type;
#ifdef XML_ELEMENT_ETYPE
type = el->etype;
#else
type = el->type;
#endif
PROTECT(rel = NEW_LIST(DTD_ELEMENT_NUM_SLOTS));
SET_VECTOR_ELT(rel, DTD_ELEMENT_NAME_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(rel, DTD_ELEMENT_NAME_SLOT), 0, COPY_TO_USER_STRING( XMLCHAR_TO_CHAR( ( el->name ? el->name : (xmlChar*)""))));
SET_VECTOR_ELT(rel, DTD_ELEMENT_TYPE_SLOT, NEW_INTEGER(1));
INTEGER_DATA(VECTOR_ELT(rel, DTD_ELEMENT_TYPE_SLOT))[0] = el->type;
SET_ENUM_NAME(ElementTypeNames, type, VECTOR_ELT(rel, DTD_ELEMENT_TYPE_SLOT));
if(el->content != NULL)
SET_VECTOR_ELT(rel, DTD_ELEMENT_CONTENT_SLOT, RS_XML(createDTDElementContents)(el->content, el, 1));
SET_VECTOR_ELT(rel, DTD_ELEMENT_ATTRIBUTES_SLOT, RS_XML(createDTDElementAttributes)(el->attributes, el));
RS_XML(SetNames)(DTD_ELEMENT_NUM_SLOTS, RS_XML(ElementNames), rel);
SET_CLASS_NAME("XMLElementDef", rel);
UNPROTECT(1);
return(rel);
}
/* Indices for the slots/elements in the list. */
enum {DTD_CONTENT_TYPE_SLOT, DTD_CONTENT_OCCURANCE_SLOT, DTD_CONTENT_ELEMENTS_SLOT, DTD_CONTENT_NUM_SLOTS};
/* names for the elements */
const char *RS_XML(ContentNames)[] = {"type", "ocur", "elements"};
/* Names for the enumerated types of the entries in the data */
const char *RS_XML(ContentTypeNames)[] = {"PCData", "Element", "Sequence","Or"};
const char *RS_XML(OccuranceNames)[] = {"Once", "Zero or One", "Mult","One or More"};
/**
Create an object representing the DTD element. The returned value is a list
with 3 elements. The names are given by the array ContentNames above. The
type and ocur elements are simple named integers identifying that the element
is simple parsed character data, an element or a composite element which is
either an one of several possible types (that is an OR or |) or an ordered
sequence of types. The ocur field indicates whether this element is to be
expected in this position exactly once (default qualifier), zero or one
(i.e. optional) (?) , any number of times (including omitted) (*) and finally
, at least once, but possible more(+)
The recursive argument allows the RS_XML(SequenceContent) routine to use part of this
routine.
*/
USER_OBJECT_
RS_XML(createDTDElementContents)(xmlElementContentPtr vals, xmlElementPtr el, int recursive)
{
char *localClassName = NULL;
int num = 0;
USER_OBJECT_ ans = NULL_USER_OBJECT;
PROTECT(ans = NEW_LIST(DTD_CONTENT_NUM_SLOTS));
SET_VECTOR_ELT(ans, DTD_CONTENT_TYPE_SLOT, NEW_INTEGER(1));
INTEGER_DATA(VECTOR_ELT(ans, DTD_CONTENT_TYPE_SLOT))[0] = vals->type;
SET_ENUM_NAME(ContentTypeNames, vals->type, VECTOR_ELT(ans, DTD_CONTENT_TYPE_SLOT));
SET_VECTOR_ELT(ans, DTD_CONTENT_OCCURANCE_SLOT, NEW_INTEGER(1));
INTEGER_DATA(VECTOR_ELT(ans, DTD_CONTENT_OCCURANCE_SLOT))[0] = vals->ocur;
SET_ENUM_NAME(OccuranceNames, vals->ocur, VECTOR_ELT(ans, DTD_CONTENT_OCCURANCE_SLOT));
if(vals->type == XML_ELEMENT_CONTENT_SEQ && recursive) {
SET_VECTOR_ELT(ans, DTD_CONTENT_ELEMENTS_SLOT, RS_XML(SequenceContent)(vals, el));
} else {
num += (vals->c1 != NULL);
if(recursive || 1)
num += (vals->c2 != NULL);
if(num > 0) {
SET_VECTOR_ELT(ans, DTD_CONTENT_ELEMENTS_SLOT, NEW_LIST(num));
num = 0;
if(vals->c1) {
SET_VECTOR_ELT(VECTOR_ELT(ans, DTD_CONTENT_ELEMENTS_SLOT), num++, RS_XML(createDTDElementContents)(vals->c1, el, 1));
}
if(recursive || 1) {
if(vals->c2) {
SET_VECTOR_ELT(VECTOR_ELT(ans, DTD_CONTENT_ELEMENTS_SLOT), num++, RS_XML(createDTDElementContents)(vals->c2, el, 1));
}
}
} else {
if(vals->name) {
SET_VECTOR_ELT(ans, DTD_CONTENT_ELEMENTS_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, DTD_CONTENT_ELEMENTS_SLOT), 0, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(vals->name)));
}
}
}
switch(vals->type) {
case XML_ELEMENT_CONTENT_SEQ:
localClassName = "XMLSequenceContent";
break;
case XML_ELEMENT_CONTENT_OR:
localClassName = "XMLOrContent";
break;
default:
localClassName = "XMLElementContent";
}
if(localClassName) {
SET_CLASS_NAME(localClassName, ans);
}
RS_XML(SetNames)(DTD_CONTENT_NUM_SLOTS, RS_XML(ContentNames), ans);
UNPROTECT(1);
return(ans);
}
/**
Process the DTD element, knowing that it is a sequence definition.
Compute the number of elements in the sequence by flattening out the
lob-sided tree and then convert the each element and append it to the list.
*/
USER_OBJECT_
RS_XML(SequenceContent)(xmlElementContentPtr vals, xmlElementPtr el)
{
xmlElementContentPtr ptr = vals->c2;
int ok = 1, n=1, deep = 0;
USER_OBJECT_ ans = NULL_USER_OBJECT;
USER_OBJECT_ tmp;
/* Count the number of elements in this sequence.
Descend all the c2's below this one.
*/
while(ptr && ok) {
ok = (ptr->type == XML_ELEMENT_CONTENT_SEQ);
ptr = ptr->c2;
n++;
}
/* Now build the list and the elements within it.*/
PROTECT(ans = NEW_LIST(n));
SET_VECTOR_ELT(ans, 0, RS_XML(createDTDElementContents)(vals->c1, el, 1));
ptr = vals->c2;
n = 1;
do {
/* Some jumping around here beacuse of the recursion and split types. Should be cleaner. */
deep = (ptr->c1 != NULL && ptr->type == XML_ELEMENT_CONTENT_SEQ );
tmp = RS_XML(createDTDElementContents)( deep ? ptr->c1 : ptr, el, deep);
SET_VECTOR_ELT(ans, n, tmp);
ok = (ptr->type == XML_ELEMENT_CONTENT_SEQ);
ptr = ptr->c2;
n++;
} while(ptr && ok);
UNPROTECT(1);
return(ans);
}
/**
Routine that creates a named list of XMLAttributeDef objects from a collection of
attribute definitions associated with the specified XML element definition.
*/
USER_OBJECT_
RS_XML(createDTDElementAttributes)(xmlAttributePtr vals, xmlElementPtr el)
{
USER_OBJECT_ ans = NULL_USER_OBJECT;
USER_OBJECT_ names;
xmlAttributePtr tmp = vals;
int n = 0, i;
while(tmp) {
#ifdef LIBXML2
tmp = tmp->nexth;
#else
tmp = tmp->next;
#endif
n++;
}
if(n > 0) {
tmp = vals;
PROTECT(ans = NEW_LIST(n));
PROTECT(names = NEW_CHARACTER(n));
for(i=0; i < n; i++) {
SET_VECTOR_ELT(ans, i, RS_XML(createDTDAttribute)(tmp, el));
SET_STRING_ELT(names, i, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(tmp->name)));
#ifdef LIBXML2
tmp = tmp->nexth;
#else
tmp = tmp->next;
#endif
}
SET_NAMES(ans, names);
UNPROTECT(2);
}
return(ans);
}
enum {DTD_ATTRIBUTE_NAME_SLOT, DTD_ATTRIBUTE_TYPE_SLOT, DTD_ATTRIBUTE_DEFAULT_SLOT, DTD_ATTRIBUTE_DEFAULT_VALUE_SLOT, DTD_ATTRIBUTE_NUM_SLOTS};
/* Names for the possible types of an attribute. */
const char *RS_XML(AttributeTypeNames) [] = {"CDATA","Id", "IDRef", "IDRefs", "Entity","Entities", "NMToken", "NMTokens", "Enumeration", "Notation"};
/* Names for the possible modes or default types of an attribute. */
const char *RS_XML(AttributeDefaultNames)[] = {"None", "Required", "Implied", "Fixed"};
/* Names of the elements within the returned list */
const char *RS_XML(AttributeSlotNames)[] = {"name", "type", "defaultType", "defaultValue"};
/**
Create a user-level version of a DTD attribute within an Attribute list within the DTD.
Return a vector of length 4 with elements named
Name, Type, Default Type and Default Value.
The first is a simple string (character vector of length 1). The next two are enumerated
types describing the type of the attribute value and whether it is required, fixed, implied, etc.
The final value is the default value
*/
USER_OBJECT_
RS_XML(createDTDAttribute)(xmlAttributePtr val, xmlElementPtr el)
{
USER_OBJECT_ ans;
int attrType;
#ifdef XML_ATTRIBUTE_ATYPE
attrType = val->atype;
#else
attrType = val->type;
#endif
PROTECT(ans = NEW_LIST(DTD_ATTRIBUTE_NUM_SLOTS));
SET_VECTOR_ELT(ans, DTD_ATTRIBUTE_NAME_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, DTD_ATTRIBUTE_NAME_SLOT), 0, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(val->name)));
SET_VECTOR_ELT(ans, DTD_ATTRIBUTE_TYPE_SLOT, NEW_INTEGER(1));
INTEGER_DATA(VECTOR_ELT(ans, DTD_ATTRIBUTE_TYPE_SLOT))[0] = val->type;
SET_ENUM_NAME(AttributeTypeNames, attrType, VECTOR_ELT(ans, DTD_ATTRIBUTE_TYPE_SLOT));
SET_VECTOR_ELT(ans, DTD_ATTRIBUTE_DEFAULT_SLOT, NEW_INTEGER(1));
INTEGER_DATA(VECTOR_ELT(ans, DTD_ATTRIBUTE_DEFAULT_SLOT))[0] = val->def;
SET_ENUM_NAME(AttributeDefaultNames, val->def, VECTOR_ELT(ans, DTD_ATTRIBUTE_DEFAULT_SLOT));
if(val->type == (xmlElementType)XML_ATTRIBUTE_ENUMERATION) {
SET_VECTOR_ELT(ans, DTD_ATTRIBUTE_DEFAULT_VALUE_SLOT, RS_XML(AttributeEnumerationList)(val->tree, val, el));
} else {
SET_VECTOR_ELT(ans, DTD_ATTRIBUTE_DEFAULT_VALUE_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, DTD_ATTRIBUTE_DEFAULT_VALUE_SLOT), 0, COPY_TO_USER_STRING( XMLCHAR_TO_CHAR( (val->defaultValue ? val->defaultValue : (xmlChar*)""))));
}
RS_XML(SetNames)(DTD_ATTRIBUTE_NUM_SLOTS, RS_XML(AttributeSlotNames), ans);
SET_CLASS_NAME("XMLAttributeDef", ans);
UNPROTECT(1);
return(ans);
}
/**
Return a character vector containing the elements listed in the enumeration of possible
values in the attribute. These arise in DTD entries such as
*/
USER_OBJECT_
RS_XML(AttributeEnumerationList)(xmlEnumerationPtr list, xmlAttributePtr attr, xmlElementPtr element)
{
USER_OBJECT_ ans = NULL_USER_OBJECT;
xmlEnumerationPtr tmp = list;
int n = 0;
/* Count the number of entries in the list/table. */
while(tmp) {
n++;
tmp = tmp->next;
}
/* Now convert each entry and add it to a list. */
if(n > 0) {
int i;
PROTECT(ans = NEW_CHARACTER(n));
tmp = list;
for(i = 0; i < n; i++) {
SET_STRING_ELT(ans, i, COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(tmp->name)));
tmp = tmp->next;
}
UNPROTECT(1);
}
return(ans);
}
XML/src/Utils.h 0000644 0001751 0000144 00000026610 14316271033 012775 0 ustar hornik users /*
* See Copyright for the license status of this software.
*/
#ifndef UTILS_H
#define UTILS_H
#include "RS_XML.h"
#include "RSCommon.h"
#define XMLCHAR_TO_CHAR(val) ((char *) val)
#define CHAR_TO_XMLCHAR(val) ((xmlChar *) val)
int isBlank(const char *str);
char *trim(char *str);
#include
/* name of the R class identifying a function that wants the xmlParserCtxt as the first argument. */
#define XML_PARSE_CONTEXT_FUNCTION "XMLParserContextFunction"
#define XML_PARSER_CONTEXT_TYPE_NAME "XMLParserContext"
USER_OBJECT_ RS_XML(invokeFunction)(USER_OBJECT_ fun, USER_OBJECT_ opArgs, USER_OBJECT_ state, xmlParserCtxtPtr ctx);
USER_OBJECT_ RS_XML(findFunction)(const char *opName, USER_OBJECT_ functions);
void RS_XML(SetNames)(int n, const char *cnames[], USER_OBJECT_ ans);
int RS_XML(SetClassName)(const char *name, USER_OBJECT_ target);
SEXP R_makeRefObject(void *ref, const char *className);
#ifndef SET_CLASS_NAME
#define SET_CLASS_NAME(localClassName, target) RS_XML(SetClassName)((localClassName), (target))
#endif
#ifdef LIBXML2
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
int xmlHashSize(xmlHashTablePtr table);
#endif
void RSXML_setErrorHandlers(void);
USER_OBJECT_ RS_XML(RecursiveApply)(USER_OBJECT_ top, USER_OBJECT_ func, USER_OBJECT_ klasses);
USER_OBJECT_ RS_XML(HtmlParseTree)(USER_OBJECT_ fileName, USER_OBJECT_ converterFunctions,
USER_OBJECT_ skipBlankLines, USER_OBJECT_ replaceEntities,
USER_OBJECT_ asText, USER_OBJECT_ trim, USER_OBJECT_ isURL);
USER_OBJECT_ RS_XML(getDTD)(USER_OBJECT_ dtdFileName, USER_OBJECT_ externalId,
USER_OBJECT_ asText, USER_OBJECT_ isURL, USER_OBJECT_ errorFun);
USER_OBJECT_ RS_XML(libxmlVersion)(void);
USER_OBJECT_
RS_XML(Parse)(USER_OBJECT_ fileName, USER_OBJECT_ handlers, USER_OBJECT_ endElementHandlers, USER_OBJECT_ addContext,
USER_OBJECT_ ignoreBlanks, USER_OBJECT_ useTagName, USER_OBJECT_ asText,
USER_OBJECT_ trim, USER_OBJECT_ useExpat, USER_OBJECT_ stateObject,
USER_OBJECT_ replaceEntities, USER_OBJECT_ validate, USER_OBJECT_ saxVersion,
USER_OBJECT_ branches, USER_OBJECT_ useDotNames, USER_OBJECT_ errorFun,
USER_OBJECT_ manageMemory, USER_OBJECT_ r_encoding);
/*
USER_OBJECT_ RS_XML(Parse)(USER_OBJECT_ fileName, USER_OBJECT_ handlers, USER_OBJECT_ addContext,
USER_OBJECT_ ignoreBlanks, USER_OBJECT_ useTagName, USER_OBJECT_ asText,
USER_OBJECT_ trim, USER_OBJECT_ useExpat, USER_OBJECT_ stateObject,
USER_OBJECT_ replaceEntities, USER_OBJECT_ validate);
*/
USER_OBJECT_
RS_XML(ParseTree)(USER_OBJECT_ fileName, USER_OBJECT_ converterFunctions,
USER_OBJECT_ skipBlankLines, USER_OBJECT_ replaceEntities,
USER_OBJECT_ asText, USER_OBJECT_ trim, USER_OBJECT_ validate,
USER_OBJECT_ getDTD, USER_OBJECT_ isURL,
USER_OBJECT_ addNamespaceAttributes, USER_OBJECT_ useInternalNodes,
USER_OBJECT_ s_useHTML, USER_OBJECT_ isSchema, USER_OBJECT_ fullNamespaceInfo, USER_OBJECT_ r_encoding,
USER_OBJECT_ useDotNames, USER_OBJECT_ xinclude, USER_OBJECT_ errorFun,
USER_OBJECT_ manageMemory, USER_OBJECT_ r_parserOptions, USER_OBJECT_ r_rootFirst);
USER_OBJECT_ R_newXMLDtd(USER_OBJECT_ sdoc, USER_OBJECT_ sname, USER_OBJECT_ sexternalID, USER_OBJECT_ ssysID, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_newXMLDoc(USER_OBJECT_ dtd, USER_OBJECT_ namespaces, USER_OBJECT_ isHTML);
USER_OBJECT_ R_newXMLNode(USER_OBJECT_ name, USER_OBJECT_ attrs, USER_OBJECT_ nameSpace, USER_OBJECT_ sdoc, USER_OBJECT_ namespaceDefinitions, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_newXMLTextNode(USER_OBJECT_ value, USER_OBJECT_ sdoc, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_xmlNewComment(USER_OBJECT_ str, USER_OBJECT_ sdoc, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_newXMLCDataNode(USER_OBJECT_ sdoc, USER_OBJECT_ value, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_newXMLPINode(USER_OBJECT_ sdoc, USER_OBJECT_ name, USER_OBJECT_ content, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_xmlNewNs(USER_OBJECT_ sdoc, USER_OBJECT_ shref, USER_OBJECT_ sprefix);
USER_OBJECT_ R_xmlSetNs(USER_OBJECT_ s_node, USER_OBJECT_ s_ns, USER_OBJECT_ append);
USER_OBJECT_ R_insertXMLNode(USER_OBJECT_ node, USER_OBJECT_ parent, USER_OBJECT_ r_at, USER_OBJECT_ shallow) ;
USER_OBJECT_ R_saveXMLDOM(USER_OBJECT_ sdoc, USER_OBJECT_ sfileName, USER_OBJECT_ compression, USER_OBJECT_ sindent,
USER_OBJECT_ prefix, USER_OBJECT_ r_encoding);
USER_OBJECT_ RS_XML_xmlNodeNumChildren(USER_OBJECT_ snode);
USER_OBJECT_ R_createXMLNodeRef(xmlNodePtr node, USER_OBJECT_ finalize);
USER_OBJECT_ R_createXMLDocRef(xmlDocPtr doc);
USER_OBJECT_ R_xmlCatalogResolve(SEXP r_id, SEXP type, USER_OBJECT_ debug);
USER_OBJECT_ RS_XML_setDoc(USER_OBJECT_ snode, USER_OBJECT_ sdoc);
USER_OBJECT_ RS_XML_unsetDoc(USER_OBJECT_ snode, USER_OBJECT_ unlink, USER_OBJECT_ r_parent, USER_OBJECT_ recursive);
USER_OBJECT_ RS_XML_printXMLNode(USER_OBJECT_ node, USER_OBJECT_ level, USER_OBJECT_ format, USER_OBJECT_ indent, USER_OBJECT_ r_encoding, USER_OBJECT_ r_encoding_int);
USER_OBJECT_ RS_XML_dumpHTMLDoc(USER_OBJECT_ r_node, USER_OBJECT_ format, USER_OBJECT_ r_encoding, USER_OBJECT_ indent, USER_OBJECT_ outFile);
USER_OBJECT_ RS_XML_removeChildren(USER_OBJECT_ s_node, USER_OBJECT_ kids, USER_OBJECT_ freeNode);
USER_OBJECT_ RS_XML_clone(USER_OBJECT_ obj, USER_OBJECT_ recursive, USER_OBJECT_ addFinalizer);
USER_OBJECT_ R_xmlRootNode(USER_OBJECT_ sdoc, USER_OBJECT_ skipDtd, USER_OBJECT_ manageMemory);
SEXP RS_XML_xpathEval(SEXP sdoc, SEXP r_node, SEXP path, SEXP namespaces, SEXP fun, SEXP charEncoding,
SEXP manageMemory, SEXP xpathFuns, SEXP anonFuns);
USER_OBJECT_ RS_XML_xmlNodeChildrenReferences(USER_OBJECT_ snode, USER_OBJECT_ r_addNames, USER_OBJECT_ manageMemory);
USER_OBJECT_ RS_XML(internalNodeNamespaceDefinitions)(USER_OBJECT_ r_node, USER_OBJECT_ recursive);
USER_OBJECT_ RS_XML(getDefaultValiditySetting)(USER_OBJECT_ val);
SEXP RS_XML_freeDoc(SEXP ref);
SEXP RS_XML_setRootNode(USER_OBJECT_ r_doc, USER_OBJECT_ r_node);
USER_OBJECT_ R_getNodeChildByIndex(USER_OBJECT_ snode, USER_OBJECT_ r_index, USER_OBJECT_ manageMemory);
SEXP RS_XML_setDocEl(SEXP r_node, SEXP r_doc);
USER_OBJECT_ RS_XML_isDescendantOf(USER_OBJECT_ r_node, USER_OBJECT_ r_top, USER_OBJECT_ strict);
SEXP RS_XML_getStructuredErrorHandler(void);
SEXP RS_XML_setStructuredErrorHandler(SEXP els);
SEXP R_convertDOMToHashTree(SEXP rnode, SEXP env, SEXP childrenEnv, SEXP parentEnv);
SEXP R_parseURI(SEXP r_uri);
SEXP R_getXMLFeatures(void);
SEXP R_xmlReadMemory(SEXP r_txt, SEXP len, SEXP r_encoding, SEXP r_options, SEXP r_base);
SEXP R_xmlReadFile(SEXP r_filename, SEXP r_encoding, SEXP r_options);
USER_OBJECT_ R_libxmlTypeTable_names(USER_OBJECT_ table, USER_OBJECT_ s_elType) ;
USER_OBJECT_ R_libxmlTypeTable_lookup(USER_OBJECT_ table, USER_OBJECT_ name, USER_OBJECT_ s_elType);
SEXP RS_XML_xmlSchemaValidateDoc(SEXP r_schema, SEXP r_doc, SEXP r_options, SEXP r_errorHandlers);
SEXP R_XML_indexOfChild(SEXP r_node);
SEXP RS_XML_xmlStopParser(SEXP r_context);
SEXP R_clearNodeMemoryManagement(SEXP r_node);
SEXP R_XMLInternalDocument_free(SEXP sdoc);
SEXP R_addXMLInternalDocument_finalizer(SEXP sdoc, SEXP fun);
USER_OBJECT_ R_createXMLNode(USER_OBJECT_ snode, USER_OBJECT_ handlers, USER_OBJECT_ r_trim, USER_OBJECT_ r_skipBlankLines);
USER_OBJECT_ RS_XML_xmlNodeName(USER_OBJECT_ snode);
USER_OBJECT_ RS_XML_xmlNodeNamespace(USER_OBJECT_ snode);
USER_OBJECT_ RS_XML_xmlNodeAttributes(USER_OBJECT_ snode, USER_OBJECT_ addNamespaces, USER_OBJECT_ addNamespaceURLs);
SEXP R_xmlNodeValue(SEXP node, SEXP raw, SEXP r_encoding);
SEXP R_setXMLInternalTextNode_value(SEXP node, SEXP value);
USER_OBJECT_ RS_XML_xmlNodeParent(USER_OBJECT_ snode, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_getXMLNsRef(USER_OBJECT_ r_node);
SEXP R_setXMLInternalTextNode_noenc(SEXP node);
SEXP R_isNodeChildOfAt(SEXP rkid, SEXP rnode, SEXP rat);
SEXP R_findXIncludeStartNodes(SEXP r_root, SEXP manageMemory);
SEXP RS_XML_removeAllNodeNamespaces(SEXP s_node);
SEXP RS_XML_removeNodeNamespaces(SEXP s_node, SEXP r_ns);
SEXP R_matchNodesInList(SEXP r_nodes, SEXP r_target, SEXP r_nomatch);
USER_OBJECT_ RS_XML_copyNodesToDoc(USER_OBJECT_ s_node, USER_OBJECT_ s_doc, USER_OBJECT_ manageMemory);
USER_OBJECT_ RS_XML_getDocumentName(USER_OBJECT_ sdoc);
USER_OBJECT_ RS_XML_xmlXIncludeProcessFlags(USER_OBJECT_ r_doc, USER_OBJECT_ r_flags);
USER_OBJECT_ RS_XML_xmlXIncludeProcessTreeFlags(USER_OBJECT_ r_node, USER_OBJECT_ r_flags);
USER_OBJECT_ RS_XML(internalNodeNamespaceDefinitions)(USER_OBJECT_ r_node, USER_OBJECT_ recursive);
Rboolean R_isInstanceOf(USER_OBJECT_ obj, const char *klass);
USER_OBJECT_ RS_XML_addNodeAttributes(USER_OBJECT_ s_node, USER_OBJECT_ attrs);
USER_OBJECT_ RS_XML_removeNodeAttributes(USER_OBJECT_ s_node, USER_OBJECT_ attrs, USER_OBJECT_ asNamespace);
USER_OBJECT_ RS_XML_getNsList(USER_OBJECT_ s_node, USER_OBJECT_ asRef);
USER_OBJECT_ RS_XML_setNodeName(USER_OBJECT_ s_node, USER_OBJECT_ s_name);
USER_OBJECT_ R_xmlNsAsCharacter(USER_OBJECT_ s_ns);
USER_OBJECT_ R_createXMLNsRef(xmlNsPtr ns);
USER_OBJECT_ RS_XML_getNextSibling(USER_OBJECT_ node, USER_OBJECT_ s_prev, USER_OBJECT_ manageMemory);
USER_OBJECT_ R_getXMLNodeDocument(USER_OBJECT_ s_node);
USER_OBJECT_ RS_XML_createDocFromNode(USER_OBJECT_ s_node);
SEXP R_removeInternalNode(SEXP r_node, SEXP r_free);
USER_OBJECT_ RS_XML_replaceXMLNode(USER_OBJECT_ r_old, USER_OBJECT_ r_new, USER_OBJECT_ manageMemory);
USER_OBJECT_ RS_XML_xmlAddSiblingAt(USER_OBJECT_ r_to, USER_OBJECT_ r_node, USER_OBJECT_ r_before, USER_OBJECT_ manageMemory);
SEXP RS_XML_clearCatalog(void);
SEXP RS_XML_loadCatalog(SEXP catalogs);
SEXP RS_XML_catalogAdd(SEXP orig, SEXP replace, SEXP type);
SEXP RS_XML_catalogDump(SEXP fileName);
void R_xmlFreeDoc(SEXP ref);
USER_OBJECT_ RS_XML_setDocumentName(USER_OBJECT_ sdoc, USER_OBJECT_ sname);
USER_OBJECT_ RS_XML_setKeepBlanksDefault(USER_OBJECT_ val);
SEXP RS_XML_setNS(SEXP s_node, SEXP r_ns);
SEXP stop(const char *className, const char *msg, ...);
SEXP RSXML_structuredStop(SEXP errorFun, xmlErrorPtr err);
void R_xmlStructuredErrorHandler(void *data, xmlErrorPtr err);
SEXP R_getDocEncoding(SEXP r_doc);
SEXP R_getLineNumber(SEXP r_node);
SEXP R_addXMLNodeFinalizer(SEXP r_node);
extern int R_numXMLDocs, R_numXMLDocsFreed;
SEXP CreateCharSexpWithEncoding(const xmlChar *encoding, const xmlChar *str);
#define DECL_ENCODING_FROM_NODE(node) const xmlChar *encoding = node->doc ? node->doc->encoding : NULL;
#define DECL_ENCODING_FROM_DOC(doc) const xmlChar *encoding = doc->encoding;
#define DECL_ENCODING_FROM_EVENT_PARSER(parserData) const xmlChar *encoding = parserData->ctx->encoding;
#define R_USE_XML_ENCODING 1
#ifdef R_USE_XML_ENCODING
#undef COPY_TO_USER_STRING
//#warning "Redefining COPY_TO_USER_STRING to use encoding from XML parser"
/*
#define COPY_TO_USER_STRING(x) CreateCharSexpWithEncoding(encoding, CHAR_TO_XMLCHAR (x))
*/
// #define COPY_TO_USER_STRING(x) mkChar(CHAR_TO_XMLCHAR (x))
#define COPY_TO_USER_STRING(x) mkChar((const char *) (x))
#define ENC_COPY_TO_USER_STRING(x) CreateCharSexpWithEncoding(encoding, CHAR_TO_XMLCHAR (x))
#endif
#include
#define R_CHECK_INTERRUPTS R_CheckUserInterrupt();
//#include "NodeGC.h"
SEXP R_createXMLNodeRefDirect(xmlNodePtr node, int addFinalizer);
int R_XML_getManageMemory(USER_OBJECT_ user, xmlDocPtr doc, xmlNodePtr node);
USER_OBJECT_ R_convertXMLNsRef(SEXP r_ns);
USER_OBJECT_ R_replaceNodeWithChildren(USER_OBJECT_ r_node);
#endif
XML/src/ExpatParse.h 0000644 0001751 0000144 00000001077 13607633744 013766 0 ustar hornik users #ifndef EXPAT_PARSE_H
#define EXPAT_PARSE_H
/* */
#include "xmlparse.h"
int RS_XML(parseWithParserData)(FILE *file, RS_XMLParserData *parserData);
void RS_XML(initParser)(XML_Parser parser, RS_XMLParserData *parserData);
int RS_XML(parse)(FILE *file, USER_OBJECT_ handlers);
int RS_XML(parseBufferWithParserData)(char *buf, RS_XMLParserData *parserData);
int RS_XML(externalEntityHandler)(XML_Parser parser, const XML_Char *context,
const XML_Char *base, const XML_Char *systemId,
const XML_Char *publicId);
#endif
XML/src/Makevars.win 0000644 0001751 0000144 00000000557 13700634770 014025 0 ustar hornik users PKG_CPPFLAGS= -I${LIB_XML}/include/libxml2 -I${LIB_XML}/include -D_R_=1 -DUSE_R=1 -DUSE_XML_VERSION_H=1 -DLIBXML -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DUMP_WITH_ENCODING=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1 -DLIBXML2=1 -DHAVE_XML_HAS_FEATURE -DLIBXML_STATIC -DNO_XML_HASH_SCANNER_RETURN=1
PKG_LIBS = -L${LIB_XML}/lib -lxml2 -liconv -lz -lws2_32
XML/src/RS_XML.h 0000644 0001751 0000144 00000001444 13607633744 012754 0 ustar hornik users /*
* See Copyright for the license status of this software.
*/
#ifndef RS_XML_H
#define RS_XML_H
#define RS_XML(a) RS_XML_##a
/* #define R_XML_DEBUG 1 */
#if 0
#if 1
#define XML_REF_COUNT_NODES 1
#else
#ifdef XML_REF_COUNT_NODES
#undef XML_REF_COUNT_NODES
#endif
#endif
#endif
/* #undef XML_REF_COUNT_NODES */
typedef enum {RS_XML_FILENAME, RS_XML_TEXT, RS_XML_CONNECTION, RS_XML_INVALID_CONTENT} RS_XML_ContentSourceType;
#ifdef _R_
#include "R.h"
#include "Rinternals.h"
#if 0
#if R_VERSION < R_Version(1, 2, 0)
#define STRING_ELT(x,i) STRING(x)[i]
#define VECTOR_ELT(x,i) VECTOR(x)[i]
#define SET_STRING_ELT(x,i,v) (STRING(x)[i]=(v))
#define SET_VECTOR_ELT(x,i,v) (VECTOR(x)[i]=(v))
#endif
#endif /* end of ignoring version details */
#endif /* end of _R_ */
#endif
XML/src/XMLTree.c 0000644 0001751 0000144 00000130072 14636530730 013155 0 ustar hornik users /**
The purpose of this file is to provide the C-level facilities
to create, modify and manage internal XML DOM nodes at the S
language level. We want to be able to use the interface defined
by xmlOutputDOM() and xmlOutputBuffer() but with an implementation
that returns a tree that is built to be used with the libxml
data structures. So the intent is to incrementally add nodes
to the tree in memory and then pass this to libxml to add it to
another tree or write it to a file, etc.
The essential public/high-level functionality provided by the the S-leve interface
for building trees consists of:
1) addTag
2) closeTag
3) addComment
4) value
addNode
a) getOpenTag
b) reset
*/
#include "RSCommon.h"
#include "RS_XML.h"
#ifdef FROM_GNOME_XML_DIR
#include
#include
#else
#include
#include
#include
#endif
#define R_USE_XML_ENCODING 1
#include "Utils.h" /* R_createXMLNodeRef, Encoding macros. */
#include "NodeGC.h"
#ifdef USE_OLD_ROOT_CHILD_NAMES
# define XML_ROOT(n) (n)->childs
#else
# define XML_ROOT(n) (n)->xmlRootNode
#endif
void incrementDocRef(xmlDocPtr doc);
int getNodeCount(xmlNodePtr node);
void incrementDocRefBy(xmlDocPtr doc, int num);
void RS_XML_recursive_unsetListDoc(xmlNodePtr list);
/**
Create a libxml comment node and return it as an S object
referencing this value.
*/
USER_OBJECT_
R_xmlNewComment(USER_OBJECT_ str, USER_OBJECT_ sdoc, USER_OBJECT_ manageMemory)
{
xmlNodePtr node;
xmlDocPtr doc = NULL;
xmlChar *txt;
if(GET_LENGTH(sdoc))
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
txt = CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(str, 0)));
node = doc ? xmlNewDocComment(doc, txt) : xmlNewComment(txt);
return(R_createXMLNodeRef(node, manageMemory));
}
USER_OBJECT_
R_newXMLTextNode(USER_OBJECT_ value, USER_OBJECT_ sdoc, SEXP manageMemory)
{
xmlNodePtr node;
xmlDocPtr doc = NULL;
xmlChar *txt;
if(GET_LENGTH(sdoc))
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
txt = CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(value, 0)));
if(doc)
node = xmlNewDocTextLen(doc, txt, (int)strlen(XMLCHAR_TO_CHAR(txt)));
else
node = xmlNewText(txt);
return(R_createXMLNodeRef(node, manageMemory));
}
USER_OBJECT_
R_newXMLCDataNode(USER_OBJECT_ sdoc, USER_OBJECT_ value, USER_OBJECT_ manageMemory)
{
xmlDocPtr doc = NULL;
xmlNodePtr node;
const char *tmp;
if(GET_LENGTH(sdoc))
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
tmp = CHAR_DEREF(STRING_ELT(value,0));
node = xmlNewCDataBlock(doc, CHAR_TO_XMLCHAR(tmp), (int)strlen(tmp));
return(R_createXMLNodeRef(node, manageMemory));
}
USER_OBJECT_
R_newXMLPINode(USER_OBJECT_ sdoc, USER_OBJECT_ name, USER_OBJECT_ content, USER_OBJECT_ manageMemory)
{
xmlNodePtr node;
node = xmlNewPI(CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(name, 0))), CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(content, 0))));
return( R_createXMLNodeRef(node, manageMemory) );
}
USER_OBJECT_
R_newXMLNode(USER_OBJECT_ name, USER_OBJECT_ attrs, USER_OBJECT_ nameSpace, USER_OBJECT_ sdoc,
USER_OBJECT_ nameSpaceDefinitions, USER_OBJECT_ manageMemory)
{
xmlDocPtr doc = NULL;
xmlNsPtr ns = NULL;
xmlNodePtr node;
if(GET_LENGTH(sdoc)) {
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
if(doc->type != XML_DOCUMENT_NODE && doc->type != XML_HTML_DOCUMENT_NODE)
doc = doc->doc;
}
if(GET_LENGTH(nameSpace) > 0) {
/* Need the default namespace and then also any other */
CHAR_DEREF(STRING_ELT(nameSpace, 0));
}
node = xmlNewDocNode(doc, ns, CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(name, 0))), NULL);
if(doc && XML_ROOT(doc) == NULL) {
XML_ROOT(doc) = node;
}
return( R_createXMLNodeRef(node, manageMemory) );
}
USER_OBJECT_
RS_XML_getNextSibling(USER_OBJECT_ s_node, USER_OBJECT_ s_prev, USER_OBJECT_ manageMemory)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node), ptr;
ptr = LOGICAL(s_prev)[0] ? node->next : node->prev;
return(ptr ? R_createXMLNodeRef(ptr, manageMemory) : NULL_USER_OBJECT);
}
/*
Add attributes to an existing node.
At present, doesn't check for duplicates.
Can do this in C or in R, but need to remove existing values,
and ensure that namespace considerations are handled properly.
*/
USER_OBJECT_
RS_XML_addNodeAttributes(USER_OBJECT_ s_node, USER_OBJECT_ attrs)
{
int i, n;
USER_OBJECT_ attr_names;
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
n = GET_LENGTH(attrs);
attr_names = GET_NAMES(attrs);
for(i = 0; i < n; i++) {
xmlSetProp(node, CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(attr_names, i))), CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(attrs, i))));
}
return(ScalarInteger(n));
}
USER_OBJECT_
RS_XML_setNodeName(USER_OBJECT_ s_node, USER_OBJECT_ s_name)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
xmlChar *name = CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(s_name, 0)));
xmlNodeSetName(node, name);
return(NULL_USER_OBJECT);
}
#if 0
int
removeNodeNamespace(xmlNodePtr node, xmlNsPtr p)
{
if(!p)
return(0);
if(!node->prev)
node->ns = p->next;
else
p->v->next = p->next;
return(1);
}
#endif
int
removeNodeNamespaceByName(xmlNodePtr node, const char * const id)
{
xmlNsPtr p, prev;
if(!node->nsDef)
return(0);
prev = node->nsDef;
p = node->nsDef;
if(!(id[0] && !p->prefix) || (p->prefix && strcmp((const char *)p->prefix, id) == 0)) {
/*XXX Free or not */
if(node->ns == p)
node->ns = NULL;
node->nsDef = p->next;
return(1);
}
while(1) {
if((!id[0] && !p->prefix) || (p->prefix && strcmp((const char *)p->prefix, id) == 0)) {
prev->next = p->next;
if(node->ns == p)
node->ns = NULL;
return(1);
}
prev = p;
p = p->next;
}
return(0);
}
SEXP
R_replaceDummyNS(USER_OBJECT_ s_node, USER_OBJECT_ newNS, USER_OBJECT_ prefix)
{
xmlNodePtr node;
if(TYPEOF(s_node) != EXTPTRSXP) {
Rf_error("non external pointer passed to R_replaceDummyNS");
}
node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
removeNodeNamespaceByName(node, CHAR(STRING_ELT(prefix, 0)));
return(R_xmlSetNs(s_node, newNS, ScalarLogical(0)));
// return(newNS);
}
SEXP
RS_XML_removeAllNodeNamespaces(SEXP s_node)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
xmlNsPtr p, tmp;
int n = 0;
if(!node)
return(ScalarLogical(FALSE));
p = node->nsDef;
while(p) {
if(node->ns == p) {
node->ns = NULL;
}
tmp = p;
p = p->next;
if(0 && tmp->type)
xmlFreeNs(tmp);
n++;
}
node->nsDef = NULL;
return(ScalarInteger(n));
}
SEXP
RS_XML_removeNodeNamespaces(SEXP s_node, SEXP r_ns)
{
int i, n;
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
SEXP el, ans;
const char *prefix;
// xmlNsPtr p;
// int t = TYPEOF(r_ns);
n = Rf_length(r_ns);
PROTECT(ans = allocVector(LGLSXP, n));
for(i = 0; i < n; i++) {
el = VECTOR_ELT(r_ns, i);
if(TYPEOF(el) == STRSXP) {
prefix = CHAR(STRING_ELT(el, 0));
LOGICAL(ans)[i] = removeNodeNamespaceByName(node, prefix);
} else if(TYPEOF(el) == EXTPTRSXP) {
xmlNsPtr p = (xmlNsPtr) R_ExternalPtrAddr(el);
LOGICAL(ans)[i] = removeNodeNamespaceByName(node, (const char *)p->prefix);
}
}
UNPROTECT(1);
return(ans);
}
/*
attrs is a vector whose names identify
*/
USER_OBJECT_
RS_XML_removeNodeAttributes(USER_OBJECT_ s_node, USER_OBJECT_ attrs, USER_OBJECT_ asNamespace)
{
int i, n;
USER_OBJECT_ attr_names, ans;
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
n = GET_LENGTH(attrs);
PROTECT(ans = NEW_LOGICAL(n));
attr_names = GET_NAMES(attrs);
for(i = 0; i < n; i++) {
if(TYPEOF(attrs) == INTSXP) {
int which = INTEGER(attrs)[i] - i - 1;
xmlAttrPtr p;
int j = 0;
p = node->properties;
while(j < which && p) {
p = p->next;
j++;
}
xmlUnsetNsProp(node, p->ns, p->name);
/*
if(p)
xmlFree(p);
*/
} else if(LOGICAL(asNamespace)[0]) {
xmlNsPtr ns = NULL;
xmlChar *id;
id = CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(attr_names, i)));
SEXP ai = VECTOR_ELT(attrs, i);
if(TYPEOF(ai) == EXTPTRSXP)
ns = (xmlNsPtr) R_ExternalPtrAddr(ai);
if(id[0])
INTEGER(ans)[i] = xmlUnsetNsProp(node, ns, id);
} else
INTEGER(ans)[i] = xmlUnsetProp(node, CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(attrs, i))));
}
UNPROTECT(1);
return(ans);
}
#define GET_R_XML_NODE_PTR(x) (xmlNodePtr) R_ExternalPtrAddr(s_node);
USER_OBJECT_
RS_XML_getNsList(USER_OBJECT_ s_node, USER_OBJECT_ asRef)
{
xmlNodePtr node = GET_R_XML_NODE_PTR(s_node);
xmlNsPtr *els, el;
int n = 0, i;
USER_OBJECT_ ans, names;
DECL_ENCODING_FROM_NODE(node)
els = xmlGetNsList(node->doc, node);
if(!els)
return(NULL_USER_OBJECT);
el = *els;
while(el) {
n++;
el = el->next;
}
el = *els;
if(LOGICAL(asRef)[0]) {
PROTECT(ans = NEW_LIST(n));
PROTECT(names = NEW_CHARACTER(n));
for(i = 0; i < n ; i++, el = el->next) {
if(el->prefix)
SET_STRING_ELT(names, i, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(el->prefix)));
SET_VECTOR_ELT(ans, i, R_createXMLNsRef(el));
}
} else {
PROTECT(ans = NEW_CHARACTER(n));
PROTECT(names = NEW_CHARACTER(n));
for(i = 0; i < n ; i++, el = el->next) {
if(el->prefix)
SET_STRING_ELT(names, i, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(el->prefix)));
if(el->href)
SET_STRING_ELT(ans, i, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(el->href)));
}
}
SET_NAMES(ans, names);
UNPROTECT(2);
return(ans);
}
SEXP
R_removeInternalNode(SEXP r_node, SEXP r_free)
{
xmlNodePtr node;
int n = GET_LENGTH(r_node), i;
for(i = 0; i < n; i++) {
SEXP el = VECTOR_ELT(r_node, i);
if(TYPEOF(el) != EXTPTRSXP) {
Rf_error("removeInternalNode needs ans external pointer object");
}
node = (xmlNodePtr) R_ExternalPtrAddr(el);
if(!node) {
Rf_warning("removeInternalNode ignoring a NULL external pointer object");
}
xmlUnlinkNode(node);
if(LOGICAL(r_free)[i])
xmlFreeNode(node);
}
return(NULL_USER_OBJECT);
}
SEXP
RS_XML_setRootNode(USER_OBJECT_ r_doc, USER_OBJECT_ r_node)
{
xmlDocPtr doc;
xmlNodePtr node;
doc = (xmlDocPtr) R_ExternalPtrAddr(r_doc);
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
/* Set the reference counting information. */
//if(!node->doc)
// node->doc = doc;
xmlDocSetRootElement(doc, node);
return(ScalarLogical(TRUE));
}
SEXP
R_isNodeChildOfAt(SEXP rkid, SEXP rnode, SEXP rat)
{
int i=0, at;
xmlNodePtr kid, node, ptr;
node = (xmlNodePtr) R_ExternalPtrAddr(rnode);
kid = (xmlNodePtr) R_ExternalPtrAddr(rkid);
if(!node || !kid || !kid->parent)
return(ScalarLogical(FALSE));
at = INTEGER(rat)[0] - 1;
ptr = node->children;
while(i < at && ptr) {
ptr = ptr->next;
i++;
}
return(ScalarLogical(ptr == kid));
}
/**
Add the internal XML node represented by the S object @node
as a child of the XML node represented by the S object @parent.
*/
USER_OBJECT_
R_insertXMLNode(USER_OBJECT_ node, USER_OBJECT_ parent, USER_OBJECT_ at, USER_OBJECT_ shallow)
{
// check is currently set but unused.
xmlNodePtr n, p, /*check,*/ tmp = NULL;
if(TYPEOF(parent) != EXTPTRSXP) {
Rf_error("R_insertXMLNode expects XMLInternalNode objects for the parent node");
}
if(IS_LIST(node)) {
int i;
for(i = 0; i < GET_LENGTH(node); i++)
R_insertXMLNode(VECTOR_ELT(node, i), parent, R_NilValue/*XXX*/, shallow);
return(NULL_USER_OBJECT);
}
if(TYPEOF(node) == STRSXP) {
int i;
p = (xmlNodePtr) R_ExternalPtrAddr(parent);
for(i = 0; i < GET_LENGTH(node); i++) {
n = xmlNewText((const xmlChar *)CHAR(STRING_ELT(node, i)));
xmlAddChild(p, n);
}
return(NULL_USER_OBJECT);
}
if(TYPEOF(node) != EXTPTRSXP) {
Rf_error("R_insertXMLNode expects XMLInternalNode objects");
}
p = (xmlNodePtr) R_ExternalPtrAddr(parent);
n = (xmlNodePtr) R_ExternalPtrAddr(node);
if(!p || !n) {
Rf_error("either the parent or child node is NULL");
}
#if 0
if(0 && n->parent == p || n->parent) {
/*XX Need to decrement the reference count if there is a document. */
xmlUnlinkNode(n);
}
#endif
/* Make certain the nodes belong to this document if they already belong to another by copying. */
if(n->doc && n->doc != p->doc) {
n = xmlDocCopyNode(n, p->doc, 1);
} else if(!n->doc && LOGICAL(shallow)[0]) {
/* XXX This is intended to avoid setting all the nodes to this document and then having to undo that
later on.*/
n->doc = p->doc;
}
switch(p->type) {
case XML_ELEMENT_NODE:
/* Need to be careful that if n is a text node, it could be
* absorbed into its nearest sibling and then freed. So we
take a copy of the text node*/
if(n->type == XML_TEXT_NODE) {
tmp = xmlNewText(n->content);
/* tmp = xmlCopyNode(n, 1); */
} else {
tmp = n;
if(n->_private) {
#ifdef R_XML_DEBUG
fprintf(stderr, "insertXMLNode: %p to %p, incrementing document (%p) %d\n", n, p, p->doc, *(int *) n->_private);
#endif
if(p->doc)
incrementDocRefBy(p->doc, getNodeCount(n));
}
}
/* check = */ xmlAddChild(p, tmp);
#if 0
/* XXXX */
if(n->type == XML_TEXT_NODE && check != tmp)
xmlFreeNode(tmp);
#endif
break;
case XML_DOCUMENT_NODE:
case XML_HTML_DOCUMENT_NODE:
/*check = */ xmlAddChild(p, n);
incrementDocRef((xmlDocPtr) p);
break;
case XML_PI_NODE:
xmlAddSibling(p, n);
break;
default:
{
Rf_warning("ignoring request to add child (types parent: %d, child %d)",
p->type, n->type);
}
break;
}
#if 0
/* This is where we handle the case where n being a text node may
* have been freed by xmlAddChild. */
if(check != n) {
fprintf(stderr, "xmlAddChild() may have freed the node\n");fflush(stderr);
R_ClearExternalPtr(node);
}
#endif
/* ??? internal_incrementNodeRefCount(n); */
return(NULL_USER_OBJECT);
}
USER_OBJECT_
RS_XML_xmlAddSiblingAt(USER_OBJECT_ r_to, USER_OBJECT_ r_node, USER_OBJECT_ r_after, USER_OBJECT_ manageMemory)
{
xmlNodePtr p, n, ans;
xmlNodePtr (*f)(xmlNodePtr, xmlNodePtr);
if(TYPEOF(r_to) != EXTPTRSXP) {
Rf_error("RS_XML_xmlAddSiblingAt expects XMLInternalNode objects for the parent node");
}
if(TYPEOF(r_node) != EXTPTRSXP) {
Rf_error("RS_XML_xmlAddSiblingAt expects XMLInternalNode objects for the node to add");
}
p = (xmlNodePtr) R_ExternalPtrAddr(r_to);
n = (xmlNodePtr) R_ExternalPtrAddr(r_node);
if(!p || !n) {
Rf_error("either the parent or child node is NULL");
}
f = LOGICAL(r_after)[0] ? xmlAddNextSibling : xmlAddPrevSibling ;
ans = f(p, n);
/* If adding to the root node and inserting a node before the
* current first child, update the document.*/
if(p->doc && p->doc->children == p && n->next == p)
p->doc->children = n;
incrementDocRefBy(p->doc, getNodeCount(n));
return(R_createXMLNodeRef(ans, manageMemory));
}
USER_OBJECT_
RS_XML_replaceXMLNode(USER_OBJECT_ r_old, USER_OBJECT_ r_new, USER_OBJECT_ manageMemory)
{
xmlNodePtr Old, New, ans;
if(TYPEOF(r_old) != EXTPTRSXP && TYPEOF(r_new) != EXTPTRSXP) {
Rf_error("R_replaceXMLNode expects XMLInternalNode objects");
}
Old = (xmlNodePtr) R_ExternalPtrAddr(r_old);
New = (xmlNodePtr) R_ExternalPtrAddr(r_new);
if(!Old) {
Rf_error("NULL value for XML node to replace");
}
ans = xmlReplaceNode(Old, New);
return(R_createXMLNodeRef(ans, manageMemory));
}
/*
a = newXMLNode("a", newXMLNode("b", newXMLNode("c", 3)), newXMLNode("d", "text"))
removeChildren(a, 2)
*/
USER_OBJECT_
RS_XML_removeChildren(USER_OBJECT_ s_node, USER_OBJECT_ kids, USER_OBJECT_ freeNode)
{
int i, n;
USER_OBJECT_ ans;
xmlNodePtr node = NULL, tmp;
if(GET_LENGTH(s_node)) {
node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
if(!node) {
Rf_error("Empty XMLInternalNode");
}
}
n = GET_LENGTH(kids);
PROTECT(ans = NEW_LOGICAL(n));
for(i = 0; i < n; i++) {
tmp = (xmlNodePtr) R_ExternalPtrAddr(VECTOR_ELT(kids, i));
if(!tmp)
continue;
if(node && tmp->parent != node) {
Rf_error("trying to remove a child node from a different parent node");
}
xmlUnlinkNode(tmp);
if(LOGICAL(freeNode)[0])
xmlFreeNode(tmp);
LOGICAL(ans)[i] = TRUE;
}
UNPROTECT(1);
return(ans);
}
USER_OBJECT_
R_xmlRootNode(USER_OBJECT_ sdoc, USER_OBJECT_ skipDtd, USER_OBJECT_ manageMemory)
{
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
xmlNodePtr node = NULL;
if(doc)
node = doc->children;
if(!node) {
Rf_warning("empty XML document");
return(NULL_USER_OBJECT);
}
if(LOGICAL(skipDtd)[0]) {
while(node && node->type != XML_ELEMENT_NODE /* (node->type == XML_DTD_NODE || node->type == XML_COMMENT_NODE) */) {
node = node->next;
}
}
if(node == NULL)
return(NULL_USER_OBJECT);
return(R_createXMLNodeRef(node, manageMemory));
}
/**
Create an S object representing a newly created internal
XML document object.
*/
int R_numXMLDocs = 0;
int R_numXMLDocsFreed = 0;
USER_OBJECT_
R_newXMLDoc(USER_OBJECT_ dtd, USER_OBJECT_ namespaces, USER_OBJECT_ isHTML)
{
xmlDocPtr doc;
if(LOGICAL(isHTML)[0]) {
const char *d = (TYPEOF(dtd) == STRSXP && Rf_length(dtd)) ?
CHAR_DEREF(STRING_ELT(dtd, 0)) : NULL;
if(d[0] == '5')
doc = htmlNewDoc((const xmlChar *)"", NULL);
else
doc = htmlNewDocNoDtD(d && d[0] ? CHAR_TO_XMLCHAR(d) : NULL, NULL);
} else
doc = xmlNewDoc(CHAR_TO_XMLCHAR("1.0"));
R_numXMLDocs++;
return(R_createXMLDocRef(doc));
}
USER_OBJECT_
R_newXMLDtd(USER_OBJECT_ sdoc, USER_OBJECT_ sdtdName, USER_OBJECT_ sexternalID, USER_OBJECT_ ssysID, USER_OBJECT_ manageMemory)
{
xmlDocPtr doc = NULL;
xmlChar *dtdName = NULL;
xmlChar *externalID = NULL;
xmlChar *sysID = NULL;
xmlDtdPtr node;
#define GET_STR_VAL(x) \
if(GET_LENGTH(s##x) > 0) { \
x = CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(s##x, 0))); \
if(!x[0]) \
x = NULL; \
}
GET_STR_VAL(dtdName)
GET_STR_VAL(externalID)
GET_STR_VAL(sysID)
if(sdoc != NULL_USER_OBJECT && TYPEOF(sdoc) == EXTPTRSXP)
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
node = xmlNewDtd(doc, dtdName, externalID, sysID);
/* should we do this???
xmlAddChild((xmlNodePtr) doc, (xmlNodePtr) DTD);
*/
return(R_createXMLNodeRef((xmlNodePtr) node, manageMemory));
}
/*
*/
USER_OBJECT_
R_xmlSetNs(USER_OBJECT_ s_node, USER_OBJECT_ s_ns, USER_OBJECT_ append)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
xmlNsPtr ns = NULL;
if(s_ns != NULL_USER_OBJECT)
ns = (xmlNsPtr) R_ExternalPtrAddr(s_ns);
if(LOGICAL(append)[0]) {
xmlNsPtr el;
if(!node->ns)
xmlSetNs(node, xmlNewNs(node, NULL, NULL));
el = node->ns;
while(el->next)
el = el->next;
el->next = ns;
} else
xmlSetNs(node, ns);
return(s_ns);
}
#if 0
/* remove if the above is sufficient. */
SEXP
RS_XML_setNS(SEXP s_node, SEXP r_ns)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
xmlNsPtr ns = (xmlNsPtr) R_ExternalPtrAddr(r_ns);
xmlSetNS(node, ns);
return(NULL_USER_OBJECT);
}
#endif
static const char *DummyNamespaceHREF = "";
USER_OBJECT_
R_xmlNewNs(USER_OBJECT_ sdoc, USER_OBJECT_ shref, USER_OBJECT_ sprefix)
{
xmlNodePtr doc = (xmlNodePtr) R_ExternalPtrAddr(sdoc);
const char *href = Rf_length(shref) == 0 ? DummyNamespaceHREF : CHAR_DEREF(STRING_ELT(shref, 0));
const char *prefix = NULL;
xmlNsPtr ns;
if(Rf_length(sprefix)) {
prefix = CHAR_DEREF(STRING_ELT(sprefix, 0));
if(!prefix[0])
prefix = NULL;
}
if(!href[0])
href = NULL;
ns = xmlNewNs(doc, CHAR_TO_XMLCHAR(href), CHAR_TO_XMLCHAR(prefix));
return(R_createXMLNsRef(ns)); /*XXX */
}
USER_OBJECT_
RS_XML_clone(USER_OBJECT_ obj, USER_OBJECT_ recursive, USER_OBJECT_ manageMemory)
{
if(TYPEOF(obj) != EXTPTRSXP) {
Rf_error( "clone can only be applied to an internal, C-level libxml2 object");
}
if(!R_ExternalPtrAddr(obj)) {
Rf_error( "NULL value passed to clone, possibly from a previous session");
}
if(R_isInstanceOf(obj, "XMLInternalElementNode")) {
xmlNodePtr node, node_ans;
node = (xmlNodePtr) R_ExternalPtrAddr(obj);
node_ans = xmlCopyNode(node, INTEGER(recursive)[0]);
return(R_createXMLNodeRef(node_ans, manageMemory));
} else if(R_isInstanceOf(obj, "XMLInternalDocument") || R_isInstanceOf(obj, "XMLInternalDOM")) {
xmlDocPtr doc;
doc = (xmlDocPtr) R_ExternalPtrAddr(obj);
return(R_createXMLDocRef(xmlCopyDoc(doc, INTEGER(recursive)[0]))); // , manageMemory));
}
Rf_error("clone doesn't (yet) understand this internal data type");
return(NULL_USER_OBJECT); /* never reached */
}
#ifdef R_XML_DEBUG
xmlDocPtr currentDoc;
#endif
USER_OBJECT_
R_createXMLDocRef(xmlDocPtr doc)
{
SEXP ref, tmp;
#ifdef R_XML_DEBUG
currentDoc = doc;
#endif
if(!doc)
return(R_NilValue);
initDocRefCounter(doc);
incrementDocRef(doc);
#ifdef R_XML_DEBUG
fprintf(stderr, "creating document reference %s %p, count = %d\n",
doc->URL ? doc->URL : "internally created", doc,
* ((int*) doc->_private));
#endif
PROTECT(ref = R_MakeExternalPtr(doc, Rf_install("XMLInternalDocument"), R_NilValue));
PROTECT(tmp = NEW_CHARACTER(1));
SET_STRING_ELT(tmp, 0, mkChar( doc->type == XML_HTML_DOCUMENT_NODE ? "HTMLInternalDocument" : "XMLInternalDocument"));
SET_CLASS(ref, tmp);
UNPROTECT(2);
return(ref);
}
USER_OBJECT_
R_removeXMLNsRef(xmlNsPtr ns)
{
/*XXX xmlNsPtr p = (xmlNsPtr) R_ExternalPtrAddr(); */
Rf_error("C routine R_removeXMLNsRef() not implemented yet");
return(R_NilValue);
}
USER_OBJECT_
R_createXMLNsRef(xmlNsPtr ns)
{
SEXP ref, tmp;
PROTECT(ref = R_MakeExternalPtr(ns, Rf_install("XMLNamespaceRef"), R_NilValue));
PROTECT(tmp = NEW_CHARACTER(1));
SET_STRING_ELT(tmp, 0, mkChar("XMLNamespaceRef"));
SET_CLASS(ref, tmp);
UNPROTECT(2);
return(ref);
}
USER_OBJECT_
R_convertXMLNsRef(SEXP r_ns)
{
SEXP ans;
xmlNsPtr ns;
if(TYPEOF(r_ns) != EXTPTRSXP) {
Rf_error("wrong type for namespace reference");
}
ns = (xmlNsPtr) R_ExternalPtrAddr(r_ns);
PROTECT(ans = mkString((const char *)ns->href));
SET_NAMES(ans, mkString(ns->prefix ? XMLCHAR_TO_CHAR(ns->prefix) : ""));
UNPROTECT(1);
return(ans);
}
USER_OBJECT_
R_getXMLNsRef(USER_OBJECT_ r_node)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
if(!node)
return(R_NilValue);
return(node->ns ? R_createXMLNsRef(node->ns) : R_NilValue);
}
const char *
R_getInternalNodeClass(xmlElementType type)
{
const char * p = "";
switch(type) {
case XML_ELEMENT_NODE:
p = "XMLInternalElementNode";
break;
case XML_ELEMENT_DECL:
p = "XMLInternalElementDeclNode";
break;
case XML_TEXT_NODE:
p = "XMLInternalTextNode";
break;
case XML_CDATA_SECTION_NODE:
p = "XMLInternalCDataNode";
break;
case XML_ENTITY_NODE:
p = "XMLInternalEntityNode";
break;
case XML_ENTITY_REF_NODE:
p = "XMLInternalEntityRefNode";
break;
case XML_PI_NODE:
p = "XMLInternalPINode";
break;
case XML_COMMENT_NODE:
p = "XMLInternalCommentNode";
break;
case XML_NOTATION_NODE:
p = "XMLInternalNotationNode";
break;
case XML_DTD_NODE:
p = "XMLDTDNode";
break;
case XML_NAMESPACE_DECL:
p = "XMLNamespaceDeclaration";
break;
case XML_XINCLUDE_START:
p = "XMLXIncludeStartNode";
break;
case XML_XINCLUDE_END:
p = "XMLXIncludeEndNode";
break;
case XML_ENTITY_DECL:
p = "XMLInternalEntityRefNode";
break;
case XML_ATTRIBUTE_DECL:
p = "XMLAttributeDeclNode";
break;
case XML_DOCUMENT_NODE:
p = "XMLDocumentNode";
break;
case XML_HTML_DOCUMENT_NODE:
p = "XMLHTMLDocumentNode";
break;
case XML_DOCUMENT_TYPE_NODE:
p = "XMLDocumentTypeNode";
break;
case XML_DOCUMENT_FRAG_NODE:
p = "XMLDocumentFragNode";
break;
case XML_ATTRIBUTE_NODE:
p = "XMLAttributeNode";
break;
default:
p = "XMLUnknownInternalNode";
}
return(p);
}
SEXP
R_createXMLNodeRefDirect(xmlNodePtr node, int addFinalizer)
{
SEXP ref, tmp;
PROTECT(ref = R_MakeExternalPtr(node, Rf_install("XMLInternalNode"), R_NilValue));
#ifdef XML_REF_COUNT_NODES
if(addFinalizer > 0 || (addFinalizer < 0 && !IS_NOT_OUR_NODE_TO_TOUCH(node))) {
#ifdef R_XML_DEBUG
fprintf(stderr, "Creating reference with finalizer for %s (%p) '%s'\n",
node->name, node, node->type == XML_TEXT_NODE ? node->content : "");fflush(stderr);
#endif
R_RegisterCFinalizer(ref, decrementNodeRefCount);
}
/*
#else
#warning "no ref counting enabled"
*/
#endif
PROTECT(tmp = NEW_CHARACTER(3));
SET_STRING_ELT(tmp, 0, mkChar(R_getInternalNodeClass(node->type)));
SET_STRING_ELT(tmp, 1, mkChar("XMLInternalNode"));
SET_STRING_ELT(tmp, 2, mkChar("XMLAbstractNode"));
SET_CLASS(ref, tmp);
UNPROTECT(2);
return(ref);
}
/**
Used to be used as
R_XML_getManageMemory(manageMemory, node->doc, node) > 0 ? R_createXMLNodeRef() : R_createXMLNodeRefDirect(node, 0));
*/
USER_OBJECT_
R_createXMLNodeRef(xmlNodePtr node, USER_OBJECT_ finalize)
{
int *val;
int addFinalizer = 0;
if(!node)
return(NULL_USER_OBJECT);
addFinalizer = R_XML_getManageMemory(finalize, node->doc, node);
/* !IS_NOT_OUR_NODE_TO_TOUCH(node) */
if(addFinalizer && ((node->_private && ((int*)node->_private)[1] == (int) R_MEMORY_MANAGER_MARKER)
|| !node->doc || (!(IS_NOT_OUR_DOC_TO_TOUCH(node->doc))))) {
if(node->_private == NULL) {
node->_private = calloc(2, sizeof(int));
val = (int *) node->_private;
val[1] = R_MEMORY_MANAGER_MARKER;
}
val = (int *) node->_private;
(*val)++;
if(*val == 1)
incrementDocRef(node->doc);
#ifdef R_XML_DEBUG
fprintf(stderr, "creating reference to node (%s, %d) count = %d (%p) (doc = %p count = %d)\n", node->name, node->type, (int) *val, node, node->doc, (node->doc && node->doc->_private) ? ((int *)node->doc->_private)[0] : -1);
#endif
}
return(R_createXMLNodeRefDirect(node, addFinalizer /* !IS_NOT_OUR_NODE_TO_TOUCH(node) */ ));
}
/*
May not be used. Not yet.
The idea is to allow the R user to explicitly add a finalizer, like
we do for a document.
*/
SEXP
R_addXMLNodeFinalizer(SEXP r_node)
{
#ifdef XML_REF_COUNT_NODES /* ??? should this be ifndef or ifdef.?? */
// xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
R_RegisterCFinalizer(r_node, decrementNodeRefCount);
#endif
return(r_node);
}
#define ValOrNULL(x) CHAR_TO_XMLCHAR ((x && x[0] ? x : NULL))
/**
Write the XML tree/DOM to a file or into a buffer (depending on the value
of sfileName)
It would be nice to use connections, but this is not yet possible
in full generality. Later
@sdoc: the S object that is a reference to the top-level XML DOM.
@sfileName: the S object that gives the name of the file to which the
DOM should be written or alternatively, the S value `NULL' indicating
that the DOM should be dumped to a buffer and returned as an S string.
@compression: if @sfileName is the name of a file and we are not
returning the DOM as a string, then we set the compression level
to the value of this integer, unless it is omitted and specified as
the S value `NULL'.
*/
USER_OBJECT_
R_saveXMLDOM(USER_OBJECT_ sdoc, USER_OBJECT_ sfileName, USER_OBJECT_ compression, USER_OBJECT_ sindent,
USER_OBJECT_ prefix, USER_OBJECT_ r_encoding)
{
xmlDocPtr doc;
const char *fileName = NULL;
USER_OBJECT_ ans = NULL_USER_OBJECT;
xmlDtdPtr dtd = NULL;
int oldIndent = xmlIndentTreeOutput;
const char *encoding = CHAR_DEREF(STRING_ELT(r_encoding, 0));
if(TYPEOF(sdoc) != EXTPTRSXP) {
Rf_error("document passed to R_saveXMLDOM is not an external pointer");
}
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
if(doc == NULL)
return(NEW_CHARACTER(0));
xmlIndentTreeOutput = LOGICAL_DATA(sindent)[0];
if(GET_LENGTH(prefix) == 3) {
dtd = xmlNewDtd(doc, ValOrNULL(CHAR_DEREF(STRING_ELT(prefix, 0))),
ValOrNULL(CHAR_DEREF(STRING_ELT(prefix, 1))),
ValOrNULL(CHAR_DEREF(STRING_ELT(prefix, 2))));
dtd->parent = doc;
dtd->doc = doc;
dtd->prev = doc->children->prev;
dtd->next = doc->children;
doc->children->prev = (xmlNodePtr) dtd;
doc->children = (xmlNodePtr) dtd;
}
/* Figure out what the name of the file is, or if it is NULL. */
if(GET_LENGTH(sfileName))
fileName = CHAR_DEREF(STRING_ELT(sfileName, 0));
/* If the user specified a file name, write to it and honor
the compression setting they supplied.
*/
if(fileName && fileName[0]) {
int compressionLevel = -1;
if(GET_LENGTH(compression)) {
compressionLevel = xmlGetDocCompressMode(doc);
xmlSetDocCompressMode(doc, INTEGER_DATA(compression)[0]);
}
if(encoding && encoding[0])
// xmlSaveFileEnc doesn't indent. So use xmlSaveFormatFileEnc(). Issue identified by Earl Brown.
// xmlSaveFileEnc(CHAR_DEREF(STRING_ELT(sfileName, 0)), doc, encoding);
xmlSaveFormatFileEnc(CHAR_DEREF(STRING_ELT(sfileName, 0)), doc, encoding, LOGICAL_DATA(sindent)[0]);
#if 0
else
xmlSaveFile(CHAR_DEREF(STRING_ELT(sfileName, 0)), doc);
#else
else {
FILE *f;
f = fopen(CHAR_DEREF(STRING_ELT(sfileName, 0)), "w");
if(!f) {
Rf_error("cannot create file %s. Check the directory exists and permissions are appropriate", CHAR_DEREF(STRING_ELT(sfileName, 0)) );
}
xmlDocFormatDump(f, doc, 1);
fclose(f);
}
#endif
if(compressionLevel != -1) {
xmlSetDocCompressMode(doc, compressionLevel);
}
} else {
/* So we are writing to a buffer and returning the DOM as an S string. */
xmlChar *mem;
int size;
/*??? Do we need to allocate this memory? */
PROTECT(ans = NEW_CHARACTER(1));
if(encoding && encoding[0])
xmlDocDumpFormatMemoryEnc(doc, &mem, &size, encoding, LOGICAL_DATA(sindent)[0]);
else {
xmlDocDumpFormatMemory(doc, &mem, &size, 1);
/* xmlDocDumpMemory(doc, &mem, &size); original */
}
if(dtd) {
xmlNodePtr tmp;
doc->extSubset = NULL;
tmp = doc->children->next;
tmp->prev = NULL;
doc->children = tmp;
xmlFreeDtd(dtd);
}
if(mem) {
DECL_ENCODING_FROM_DOC(doc)
SET_STRING_ELT(ans, 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(mem)));
xmlFree(mem);
} else {
/*XXX get the error message from libxml2 */
Rf_error("failed to write XML document contents");
}
UNPROTECT(1);
return(ans);
}
xmlIndentTreeOutput = oldIndent;
return(ans);
}
USER_OBJECT_
RS_XML_setDoc(USER_OBJECT_ snode, USER_OBJECT_ sdoc)
{
/*Might use xmlCopyNode or xmlCopyNodeList if we have to make a copy*/
xmlDocPtr doc;
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
if(sdoc != NULL_USER_OBJECT) {
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
} else {
doc = xmlNewDoc(CHAR_TO_XMLCHAR("1.0"));
R_numXMLDocs++;
}
xmlDocSetRootElement(doc, node);
return(R_createXMLDocRef(doc));
}
#if 0
void
RS_XML_recursive_unsetDoc(xmlNodePtr node)
{
xmlNodePtr tmp;
node->doc = NULL;
tmp = node->children;
while(tmp) {
RS_XML_recursive_unsetDoc(tmp);
tmp = tmp->next;
}
}
#endif
/* The following two routines are from Paul Murrell.
They fix a problem with xpathApply() changing the document
presumably when doing an XPath query on a node within a document.
The old version didn't deal with the properties on the node.
*/
void
RS_XML_recursive_unsetTreeDoc(xmlNodePtr node) {
xmlAttrPtr prop;
if (node == NULL)
return;
if(node->type == XML_ELEMENT_NODE) {
prop = node->properties;
while (prop != NULL) {
prop->doc = NULL;
RS_XML_recursive_unsetListDoc(prop->children);
prop = prop->next;
}
}
if (node->children != NULL)
RS_XML_recursive_unsetListDoc(node->children);
node->doc = NULL;
}
void
RS_XML_recursive_unsetListDoc(xmlNodePtr list) {
xmlNodePtr cur;
if (list == NULL)
return;
cur = list;
while (cur != NULL) {
RS_XML_recursive_unsetTreeDoc(cur);
cur = cur->next;
}
}
USER_OBJECT_
RS_XML_unsetDoc(USER_OBJECT_ snode, USER_OBJECT_ unlink, USER_OBJECT_ r_parent, USER_OBJECT_ recursive)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
if(!node) {
return(NULL_USER_OBJECT);
}
if(node->doc && node->doc->children == node) {
xmlDocSetRootElement(node->doc, NULL);
}
if(LOGICAL(unlink)[0])
xmlUnlinkNode(node);
node->doc = NULL;
node->parent = NULL;
if(r_parent != R_NilValue) {
node->parent = (xmlNodePtr) R_ExternalPtrAddr(snode);
}
if(LOGICAL(recursive)[0]) {
RS_XML_recursive_unsetTreeDoc(node);
}
return(ScalarLogical(TRUE));
}
SEXP
RS_XML_setDocEl(SEXP r_node, SEXP r_doc)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(r_doc);
xmlSetTreeDoc(node, doc);
/* node->doc = doc; */
return(R_NilValue);
}
#ifdef ADD_XML_OUTPUT_BUFFER_CODE
/* These two taken from libxml2-2.6.27
They are needed if xmlOutputBufferCreateBuffer()
is not in the installed libxml2.
It appeared in libxml2-2.6.23, released on Jan 5 2006
*/
static int
xmlBufferWrite (void * context, const char * buffer, int len) {
int ret;
ret = xmlBufferAdd((xmlBufferPtr) context, (const xmlChar *) buffer, len);
if (ret != 0)
return(-1);
return(len);
}
xmlOutputBufferPtr
xmlOutputBufferCreateBuffer(xmlBufferPtr buffer,
xmlCharEncodingHandlerPtr encoder) {
xmlOutputBufferPtr ret;
if (buffer == NULL) return(NULL);
ret = xmlOutputBufferCreateIO((xmlOutputWriteCallback)
xmlBufferWrite,
(xmlOutputCloseCallback)
NULL, (void *) buffer, encoder);
return(ret);
}
#endif
/* Not completed.
This could put the node into a new document and then call R_saveXMLDOM()
but we are doing it in separate steps with separate C routines and
calling these from R.
xmlNodeDumpOutput
Test:
a = newXMLNode("a", "first bit", newXMLNode("b", "contents of b", newXMLNode("c", 3)), "more text")
a = newXMLNode("a", newXMLNode("b", newXMLNode("c", 3)))
.Call("RS_XML_printXMLNode", a, as.integer(1), as.integer(1), character())
*/
USER_OBJECT_
RS_XML_printXMLNode(USER_OBJECT_ r_node, USER_OBJECT_ level, USER_OBJECT_ format,
USER_OBJECT_ indent, USER_OBJECT_ r_encoding, USER_OBJECT_ r_encoding_int)
{
USER_OBJECT_ ans;
xmlNodePtr node;
const char *encoding = NULL;
xmlOutputBufferPtr buf;
xmlBufferPtr xbuf;
int oldIndent;
oldIndent = xmlIndentTreeOutput;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
xmlIndentTreeOutput = LOGICAL(indent)[0];
xbuf = xmlBufferCreate();
if(GET_LENGTH(r_encoding))
encoding = CHAR_DEREF(STRING_ELT(r_encoding, 0));
buf = xmlOutputBufferCreateBuffer(xbuf, NULL);
// xmlKeepBlanksDefault(0);
xmlNodeDumpOutput(buf, node->doc, node, INTEGER(level)[0], INTEGER(format)[0], encoding);
xmlOutputBufferFlush(buf);
xmlIndentTreeOutput = oldIndent;
if(xbuf->use > 0) {
/*XXX this const char * in CHARSXP means we have to make multiple copies. */
if(INTEGER(r_encoding_int)[0] == CE_NATIVE)
ans = ScalarString(CreateCharSexpWithEncoding((const xmlChar *)encoding, (const xmlChar *)xbuf->content));
else
ans = ScalarString(mkCharCE((const char *)xbuf->content, INTEGER(r_encoding_int)[0]));
} else
ans = NEW_CHARACTER(1);
xmlOutputBufferClose(buf);
return(ans);
}
SEXP
R_setXMLInternalTextNode_noenc(SEXP node)
{
xmlNodePtr n = (xmlNodePtr) R_ExternalPtrAddr(node);
if(!n) {
Rf_error("null value passed for XMLInternalTextNode");
}
n->name = (const xmlChar *) (&xmlStringTextNoenc);
return(ScalarLogical(TRUE));
}
SEXP
/*R_setXMLInternalTextNode_value(SEXP node, SEXP value, SEXP r_encoding)*/
R_setXMLInternalTextNode_value(SEXP node, SEXP value)
{
xmlNodePtr n = (xmlNodePtr) R_ExternalPtrAddr(node);
// xmlChar *tmp;
const char *str;
// DECL_ENCODING_FROM_NODE(n)
if(n->type != XML_TEXT_NODE) {
Rf_error( "Can only set value on an text node");
}
str = CHAR(STRING_ELT(value, 0));
xmlNodeSetContent(n, (const xmlChar *)str);
return(node);
}
SEXP
R_xmlSetContent(SEXP node, SEXP content)
{
xmlNodePtr n = (xmlNodePtr) R_ExternalPtrAddr(node);
xmlNodeSetContent(n, CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(content, 0))));
return(R_NilValue);
}
SEXP
R_xmlNodeValue(SEXP node, SEXP raw, SEXP r_encoding)
{
xmlNodePtr n = (xmlNodePtr) R_ExternalPtrAddr(node);
xmlChar *tmp;
SEXP ans;
DECL_ENCODING_FROM_NODE(n)
if(!n) {
Rf_error( "null value for xml node reference");
}
tmp = xmlNodeGetContent(n);
/*
xmlGetNodeRawString
xmlGetNodeString
if(GET_LENGTH(raw) == 0)
else if(LOGICAL(raw)[0]) {
} else {
}
*/
if(tmp) {
if(INTEGER(r_encoding)[0] == CE_NATIVE)
ans = ScalarString(CreateCharSexpWithEncoding(encoding, tmp));
else
ans = ScalarString(mkCharCE((const char *)tmp, INTEGER(r_encoding)[0]));
free(tmp);
// ans = mkString(XMLCHAR_TO_CHAR(tmp));
// Just playing: ans = ScalarString(mkCharCE(tmp, CE_UTF8));
} else
ans = NEW_CHARACTER(0);
return(ans);
}
USER_OBJECT_
R_xmlNsAsCharacter(USER_OBJECT_ s_ns)
{
xmlNsPtr ns = NULL;
USER_OBJECT_ ans, names;
const xmlChar *encoding = NULL;
ns = (xmlNsPtr) R_ExternalPtrAddr(s_ns);
#ifdef LIBXML_NAMESPACE_HAS_CONTEXT
encoding = ns->context ? ns->context->encoding : NULL;
#endif
PROTECT(ans = NEW_CHARACTER(2));
PROTECT(names = NEW_CHARACTER(2));
SET_STRING_ELT(names, 0, mkChar("prefix"));
SET_STRING_ELT(names, 1, mkChar("href"));
if(ns->prefix)
SET_STRING_ELT(ans, 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(ns->prefix)));
if(ns->href)
SET_STRING_ELT(ans, 1, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(ns->href)));
SET_NAMES(ans, names);
UNPROTECT(2);
return(ans);
}
USER_OBJECT_
R_getXMLNodeDocument(USER_OBJECT_ s_node)
{
xmlNodePtr n = (xmlNodePtr) R_ExternalPtrAddr(s_node);
if(!n->doc)
return(NULL_USER_OBJECT);
/*??? Does this arrange to free it? */
return(R_createXMLDocRef(n->doc));
}
USER_OBJECT_
RS_XML_isDescendantOf(USER_OBJECT_ r_node, USER_OBJECT_ r_top, USER_OBJECT_ strict)
{
xmlNodePtr node, ptr, top;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
top = (xmlNodePtr) R_ExternalPtrAddr(r_top);
if(!node || !top) {
Rf_error( "null value passed to RS_XML_isDescendantOf");
}
/*XXX */
if(node->type == XML_NAMESPACE_DECL)
return(ScalarLogical(TRUE));
ptr = node;
while(ptr && ptr->type != XML_DOCUMENT_NODE && ptr->type != XML_HTML_DOCUMENT_NODE) {
if(ptr == top)
return(ScalarLogical(ptr == node && LOGICAL(strict)[0] ? FALSE : TRUE));
ptr = ptr->parent;
}
return(ScalarLogical(FALSE));
}
SEXP
R_XML_indexOfChild(SEXP r_node)
{
xmlNodePtr node, ptr; // parent
int i = 0;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
ptr = node->parent->children;
while(ptr) {
if(ptr == node)
return(ScalarInteger(i + 1));
i++;
ptr = ptr->next;
}
return(R_NilValue);
}
SEXP
R_setNamespaceFromAncestors(SEXP r_node)
{
xmlNodePtr node, ptr;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
ptr = node->parent;
while(ptr) {
if((ptr->type != XML_HTML_DOCUMENT_NODE && ptr->type != XML_DOCUMENT_NODE) &&
ptr->ns && ptr->ns->href && (!ptr->ns->prefix || !ptr->ns->prefix[0])) {
xmlSetNs(node, ptr->ns);
return(ScalarLogical(TRUE));
}
ptr = ptr->parent;
}
return(ScalarLogical(FALSE));
}
#ifdef R_HAS_REMOVE_FINALIZERS
int
xmlNode_removeFinalizers(xmlNodePtr node)
{
xmlNodePtr tmp;
int count = 0;
#if R_XML_DEBUG
fprintf(stderr, "xml removeFinalizers %p %s\n", node, node->name);
#endif
count = R_RemoveExtPtrWeakRef_direct(node);
tmp = node->children;
while(tmp) {
count += xmlNode_removeFinalizers(tmp);
tmp = tmp->next;
}
return(count);
}
SEXP
R_xmlNode_removeFinalizers(SEXP r_node)
{
int num;
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
num = xmlNode_removeFinalizers(node);
return(ScalarInteger(num));
}
#endif
SEXP
R_xmlSearchNs(SEXP r_doc, SEXP r_node, SEXP r_ns, SEXP r_asPrefix)
{
const xmlChar * val;
xmlNsPtr ns;
xmlDocPtr doc = (r_doc == NULL_USER_OBJECT) ? NULL : R_ExternalPtrAddr(r_doc);
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
if(Rf_length(r_ns) == 0)
return(NEW_CHARACTER(0));
val = (const xmlChar *)CHAR_DEREF(STRING_ELT(r_ns, 0));
ns = LOGICAL(r_asPrefix)[0] ? xmlSearchNs(doc, node, val) : xmlSearchNsByHref(doc, node, val);
if(!ns)
return(NEW_CHARACTER(0));
else {
SEXP r_ans;
PROTECT(r_ans = mkString((const char *)ns->href));
SET_NAMES(r_ans, mkString(ns->prefix ? XMLCHAR_TO_CHAR(ns->prefix) : ""));
UNPROTECT(1);
return(r_ans);
}
}
USER_OBJECT_
R_getChildByIndex(USER_OBJECT_ r_node, USER_OBJECT_ r_index, USER_OBJECT_ r_addFinalizer)
{
xmlNodePtr node, ptr;
int i = 0, idx;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
ptr = node->children;
idx = INTEGER(r_index)[0];
while(ptr && i < idx) {
ptr = ptr->next;
i++;
}
return(R_createXMLNodeRef(ptr, r_addFinalizer));
}
USER_OBJECT_
R_getChildByName(USER_OBJECT_ r_node, USER_OBJECT_ r_index, USER_OBJECT_ r_addFinalizer)
{
xmlNodePtr node, ptr;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
ptr = node->children;
const char *name = CHAR_DEREF(STRING_ELT(r_index, 0));
while(ptr) {
if(ptr->name && strcmp(name, (const char *)ptr->name) == 0)
break;
ptr = ptr->next;
}
return(R_createXMLNodeRef(ptr, r_addFinalizer));
}
/*
This is a C-level version equivalent to
xmlApply(node, xmlValue)
*/
USER_OBJECT_
R_childStringValues(SEXP r_node, SEXP r_len, SEXP r_asVector, SEXP r_encoding, SEXP r_addNames)
{
xmlNodePtr node, kid;
int len, i;
SEXP ans, names = NULL;
int asVector = LOGICAL(r_asVector)[0];
int encoding = INTEGER(r_encoding)[0];
xmlChar *tmp;
int nprotect = 0;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
len = INTEGER(r_len)[0];
if(asVector)
ans = NEW_CHARACTER(len);
else
ans = NEW_LIST(len);
PROTECT(ans); nprotect++;
if(LOGICAL(r_addNames)[0]) {
PROTECT(names = NEW_CHARACTER(len));
nprotect++;
}
for(i = 0, kid = node->children; kid && i < len; i++, kid = kid->next) {
tmp = xmlNodeGetContent(kid);
SEXP val = mkCharCE((const char *)tmp, encoding);
PROTECT(val);
if(asVector)
SET_STRING_ELT(ans, i, val);
else
SET_VECTOR_ELT(ans, i, ScalarString(val));
if(names && kid->name) {
SET_STRING_ELT(names, i, mkCharCE((const char *)kid->name, encoding));
}
UNPROTECT(1);
}
if(names)
SET_NAMES(ans, names);
UNPROTECT(nprotect);
return(ans);
}
USER_OBJECT_
R_replaceNodeWithChildren(USER_OBJECT_ r_node)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
xmlNodePtr nxt = node->next;
if(node->prev) {
node->prev->next = node->children;
node->children->prev = node->prev;
} else if(node->parent)
node->parent->children = node->children;
if(node->children) {
xmlNodePtr cur = node->children;
while(cur->next) {
cur->parent = node->parent;
cur = cur->next;
}
cur->next = nxt;
if(nxt)
nxt->prev = cur;
}
return(NULL_USER_OBJECT);
}
XML/src/RSDTD.h 0000644 0001751 0000144 00000002777 13607633744 012602 0 ustar hornik users /*
* See Copyright for the license status of this software.
*/
#ifndef RSDTD_H
#define RSDTD_H
#include "RSCommon.h"
#define RS_XML(a) RS_XML_##a
#ifdef FROM_GNOME_XML_DIR
#include
#include
#include
#include
#include
#else
#if 0
/* Problems with xmlValidCtxt in libxml2-2.4.[n] where n >= 21*/
#include
#endif
#include
#include
#include
#include
#endif
USER_OBJECT_ RS_XML(createDTDElement)(xmlElementPtr el);
USER_OBJECT_ RS_XML(createDTDElementContents)(xmlElementContentPtr vals, xmlElementPtr el, int recursive);
USER_OBJECT_ RS_XML(createDTDElementAttributes)(xmlAttributePtr vals, xmlElementPtr el);
USER_OBJECT_ RS_XML(createDTDAttribute)(xmlAttributePtr val, xmlElementPtr el);
USER_OBJECT_ RS_XML(AttributeEnumerationList)(xmlEnumerationPtr list, xmlAttributePtr attr, xmlElementPtr element);
USER_OBJECT_ RS_XML(SequenceContent)(xmlElementContentPtr vals, xmlElementPtr el);
USER_OBJECT_ RS_XML(ProcessElements)(xmlElementTablePtr table, xmlParserCtxtPtr ctxt);
USER_OBJECT_ RS_XML(ProcessEntities)(xmlEntitiesTablePtr table, xmlParserCtxtPtr ctxt);
USER_OBJECT_ RS_XML(createDTDEntity)(xmlEntityPtr entity);
USER_OBJECT_ RS_XML(createDTDParts)(xmlDtdPtr dtd, xmlParserCtxtPtr ctxt);
USER_OBJECT_ RS_XML(ConstructDTDList)(xmlDocPtr myDoc, int processInternals, xmlParserCtxtPtr ctxt);
#endif
XML/src/xpath.c 0000644 0001751 0000144 00000043413 14327573457 013035 0 ustar hornik users #include "RS_XML.h"
#include
#include "Utils.h"
void xpathTolower(xmlXPathParserContextPtr ctxt, int nargs);
void xpathGrepl(xmlXPathParserContextPtr ctxt, int nargs);
void xpathReplace(xmlXPathParserContextPtr ctxt, int nargs);
void xpathEndswith(xmlXPathParserContextPtr ctxt, int nargs);
void xpathAbs(xmlXPathParserContextPtr ctxt, int nargs);
void xpathBaseURI(xmlXPathParserContextPtr ctxt, int nargs);
void xpathMin(xmlXPathParserContextPtr ctxt, int nargs);
void xpathMax(xmlXPathParserContextPtr ctxt, int nargs);
void R_genericXPathFun(xmlXPathParserContextPtr ctxt, int nargs);
void R_genericAnonXPathFun(xmlXPathParserContextPtr ctxt, int nargs);
static SEXP
convertNodeSetToR(xmlNodeSetPtr obj, SEXP fun, int encoding, SEXP manageMemory)
{
SEXP ans, expr = NULL, arg = NULL, ref;
int i;
int nprot = 0;
if(!obj)
return(NULL_USER_OBJECT);
PROTECT(ans = NEW_LIST(obj->nodeNr)); nprot++;
if(GET_LENGTH(fun) && (TYPEOF(fun) == CLOSXP || TYPEOF(fun) == BUILTINSXP)) {
PROTECT(expr = allocVector(LANGSXP, 2)); nprot++;
SETCAR(expr, fun);
arg = CDR(expr);
} else if(TYPEOF(fun) == LANGSXP) {
// change from Tomas Kalibera 2016-11-10
PROTECT(expr = duplicate(fun)); nprot++;
arg = CDR(expr);
}
for(i = 0; i < obj->nodeNr; i++) {
xmlNodePtr el;
el = obj->nodeTab[i];
if(el->type == XML_ATTRIBUTE_NODE) {
#if 0
PROTECT(ref = mkString((el->children && el->children->content) ? XMLCHAR_TO_CHAR(el->children->content) : ""));
nprot++;
SET_NAMES(ref, mkString(el->name));
#else
PROTECT(ref = ScalarString(mkCharCE((el->children && el->children->content) ? XMLCHAR_TO_CHAR(el->children->content) : "", encoding)));
SET_NAMES(ref, ScalarString(mkCharCE((const char *)el->name, encoding)));
#endif
SET_CLASS(ref, mkString("XMLAttributeValue"));
UNPROTECT(1);
} else if(el->type == XML_NAMESPACE_DECL)
ref = R_createXMLNsRef((xmlNsPtr) el);
else
ref = R_createXMLNodeRef(el, manageMemory);
if(expr) {
PROTECT(ref);
SETCAR(arg, ref);
PROTECT(ref = Rf_eval(expr, R_GlobalEnv)); /*XXX do we want to catch errors here? Maybe to release the namespaces. */
SET_VECTOR_ELT(ans, i, ref);
UNPROTECT(2);
} else
SET_VECTOR_ELT(ans, i, ref);
}
// change from Tomas Kalibera 2016-11-10
if(!expr)
SET_CLASS(ans, mkString("XMLNodeSet"));
UNPROTECT(nprot);
return(ans);
}
SEXP
convertXPathObjectToR(xmlXPathObjectPtr obj, SEXP fun, int encoding, SEXP manageMemory)
{
SEXP ans = NULL_USER_OBJECT;
switch(obj->type) {
case XPATH_NODESET:
ans = convertNodeSetToR(obj->nodesetval, fun, encoding, manageMemory);
break;
case XPATH_BOOLEAN:
ans = ScalarLogical(obj->boolval);
break;
case XPATH_NUMBER:
ans = ScalarReal(obj->floatval);
if(xmlXPathIsInf(obj->floatval))
REAL(ans)[0] = xmlXPathIsInf(obj->floatval) < 0 ? R_NegInf : R_PosInf;
else if(xmlXPathIsNaN(obj->floatval))
REAL(ans)[0] = NA_REAL;
break;
case XPATH_STRING:
ans = mkString(XMLCHAR_TO_CHAR(obj->stringval)); //XXX encoding
break;
// Next three not currently in xmlXPathObjectType
#ifdef LIBXML_XPTR_LOCS_ENABLED
case XPATH_POINT:
case XPATH_RANGE:
case XPATH_LOCATIONSET:
#endif
case XPATH_USERS:
Rf_warning("currently unsupported xmlXPathObject type %d in convertXPathObjectToR. Please send mail to maintainer.", obj->type);
default:
ans = R_NilValue;
}
return(ans);
}
#include /* For xmlXPathRegisterNs() */
xmlNsPtr *
R_namespaceArray(SEXP namespaces, xmlXPathContextPtr ctxt)
{
int i, n;
SEXP names = GET_NAMES(namespaces);
xmlNsPtr *els;
n = GET_LENGTH(namespaces);
els = xmlMallocAtomic(sizeof(xmlNsPtr) * n);
if(!els) {
Rf_error( "Failed to allocate space for namespaces");
}
for(i = 0; i < n; i++) {
/*XXX who owns these strings. */
const xmlChar *prefix, *href;
href = CHAR_TO_XMLCHAR(strdup(CHAR_DEREF(STRING_ELT(namespaces, i))));
prefix = names == NULL_USER_OBJECT ? CHAR_TO_XMLCHAR("") /* NULL */
: CHAR_TO_XMLCHAR(strdup(CHAR_DEREF(STRING_ELT(names, i))));
els[i] = xmlNewNs(NULL, href, prefix);
if(ctxt)
xmlXPathRegisterNs(ctxt, prefix, href);
}
return(els);
}
#if R_XML_DEBUG_WEAK_REFS
SEXP LastDoc = NULL;
SEXP
R_isWeakRef(SEXP sdoc)
{
void *ptr;
if(sdoc == R_NilValue) {
if(LastDoc == NULL)
return(R_NilValue);
sdoc = LastDoc;
}
ptr = R_ExternalPtrAddr(sdoc);
return(ScalarLogical(R_findExtPtrWeakRef(ptr)));
}
#endif
SEXP
R_addXMLInternalDocument_finalizer(SEXP sdoc, SEXP fun)
{
R_CFinalizer_t action = NULL;
#if R_XML_DEBUG_WEAK_REFS
LastDoc = sdoc;
#endif
if(TYPEOF(fun) == CLOSXP) {
R_RegisterFinalizer(sdoc, fun);
return(sdoc);
}
if(fun == R_NilValue) {
action = R_xmlFreeDoc;
} else if(TYPEOF(fun) == EXTPTRSXP)
action = (R_CFinalizer_t) R_ExternalPtrAddr(fun);
R_RegisterCFinalizer(sdoc, action);
#ifdef R_XML_DEBUG_WEAK_REFS
void *ptr = R_ExternalPtrAddr(sdoc);
int status = R_findExtPtrWeakRef(ptr);
fprintf(stderr, "is weak ref %d\n", status);
#endif
return(sdoc);
}
SEXP
R_XMLInternalDocument_free(SEXP sdoc)
{
if(TYPEOF(sdoc) != EXTPTRSXP || R_ExternalPtrTag(sdoc) != Rf_install("XMLInternalDocument")) {
Rf_error("R_free must be given an internal XML document object, 'XMLInternalDocument'");
}
R_xmlFreeDoc(sdoc);
return(sdoc);
}
/* This may go into the context object */
static SEXP R_AnonXPathFuns = NULL;
SEXP
RS_XML_xpathEval(SEXP sdoc, SEXP r_node, SEXP path, SEXP namespaces, SEXP fun, SEXP charEncoding,
SEXP manageMemory, SEXP xpathFuns, SEXP anonFuns)
{
xmlXPathContextPtr ctxt = NULL;
xmlXPathObjectPtr result;
SEXP ans = NULL_USER_OBJECT;
xmlDocPtr doc;
if(TYPEOF(sdoc) != EXTPTRSXP || R_ExternalPtrTag(sdoc) != Rf_install("XMLInternalDocument")) {
Rf_error("xpathEval must be given an internal XML document object, 'XMLInternalDocument'");
}
doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
ctxt = xmlXPathNewContext(doc);
if(GET_LENGTH(r_node)) {
ctxt->node = ctxt->origin = R_ExternalPtrAddr(r_node);
}
if(GET_LENGTH(namespaces)) {
ctxt->namespaces = R_namespaceArray(namespaces, ctxt); /* xmlCopyNamespaceList(doc); */
ctxt->nsNr = GET_LENGTH(namespaces);
}
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"lower-case", xpathTolower);
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"ends-with", xpathEndswith);
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"matches", xpathGrepl);
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"replace", xpathReplace);
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"abs", xpathAbs);
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"base-uri", xpathBaseURI);
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"min", xpathMin);
xmlXPathRegisterFunc(ctxt, (const xmlChar *)"max", xpathMax);
R_AnonXPathFuns = anonFuns;
if(Rf_length(xpathFuns)) {
SEXP names = GET_NAMES(xpathFuns), el;
int i;
xmlXPathFunction routine;
const xmlChar *id;
for(i = 0; i < Rf_length(xpathFuns); i++) {
el = VECTOR_ELT(xpathFuns, i);
id = (names != R_NilValue) ? (const xmlChar *)CHAR(STRING_ELT(names, i)) : NULL;
if(TYPEOF(el) == EXTPTRSXP) {
routine = R_ExternalPtrAddr(el);
if(!id) {
Rf_error("no name for XPath function routine");
}
} else if(TYPEOF(el) == CLOSXP) {
routine = R_genericAnonXPathFun;
} else {
routine = R_genericXPathFun;
if(TYPEOF(el) == STRSXP)
id = (const xmlChar *)CHAR(STRING_ELT(el, 0));
}
xmlXPathRegisterFunc(ctxt, id, routine);
}
}
result = xmlXPathEvalExpression(CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(path, 0))), ctxt);
if(result)
ans = convertXPathObjectToR(result, fun, INTEGER(charEncoding)[0], manageMemory);
xmlXPathFreeObject(result);
xmlXPathFreeContext(ctxt);
R_AnonXPathFuns = NULL;
if(!result) {
Rf_error( "error evaluating xpath expression %s", CHAR_DEREF(STRING_ELT(path, 0)));
}
return(ans);
}
USER_OBJECT_
RS_XML_createDocFromNode(USER_OBJECT_ s_node)
{
xmlDocPtr doc;
xmlNodePtr node, ptr;
SEXP ans;
node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
doc = xmlNewDoc(CHAR_TO_XMLCHAR("1.0"));
R_numXMLDocs++;
ptr = xmlDocCopyNode(node, doc, 1);
node = (xmlNodePtr) doc;
xmlAddChild(node, ptr);
ans = R_createXMLDocRef(doc);
return(ans);
}
USER_OBJECT_
RS_XML_copyNodesToDoc(USER_OBJECT_ s_node, USER_OBJECT_ s_doc, USER_OBJECT_ manageMemory)
{
xmlDocPtr doc;
xmlNodePtr node, ptr;
int len, i;
SEXP ans;
doc = (xmlDocPtr) R_ExternalPtrAddr(s_doc);
if(TYPEOF(s_node) == EXTPTRSXP) {
node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
ptr = xmlDocCopyNode(node, doc, 1);
return(R_createXMLNodeRef(ptr, manageMemory));
}
len = Rf_length(s_node);
PROTECT(ans = NEW_LIST(len));
for(i = 0; i < len; i++) {
node = (xmlNodePtr) R_ExternalPtrAddr(VECTOR_ELT(s_node, i));
ptr = xmlDocCopyNode(node, doc, 1);
SET_VECTOR_ELT(ans, i, R_createXMLNodeRef(ptr, manageMemory));
}
UNPROTECT(1);
return(ans);
}
/*
Thoughts that we could set the kids to NULL and then free the doc
after we createDocFromNode but the return of xpathApply will return
these nodes and we need to be able to get to a document
*/
SEXP
RS_XML_killNodesFreeDoc(SEXP sdoc)
{
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
if(!doc) {
Rf_warning("null xmlDocPtr passed as externalptr to RS_XML_killNodesFreeDoc");
return(ScalarLogical(FALSE));
}
doc->children = NULL;
xmlFree(doc);
return(ScalarLogical(TRUE));
}
#if 0
SEXP
RS_XML_xpathNodeEval(SEXP s_node, SEXP path, SEXP namespaces, SEXP fun)
{
xmlXPathContextPtr ctxt = NULL;
xmlXPathObjectPtr result;
SEXP ans = NULL_USER_OBJECT;
xmlDocPtr doc;
if(TYPEOF(s_node) != EXTPTRSXP || R_ExternalPtrTag(s_node) != Rf_install("XMLInternalNode")) {
Rf_error("xpathEval must be given an internal XML document object, 'XMLInternalNode'");
}
ctxt = xmlXPathNewContext(doc);
if(GET_LENGTH(namespaces)) {
ctxt->namespaces = R_namespaceArray(namespaces, ctxt); /* xmlCopyNamespaceList(doc); */
ctxt->nsNr = GET_LENGTH(namespaces);
}
result = xmlXPathEvalExpression(CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(path, 0))), ctxt);
if(result)
ans = convertXPathObjectToR(result, fun);
xmlXPathFreeObject(result);
xmlXPathFreeContext(ctxt);
if(!result) {
Rf_error( "error evaluating xpath expression %s", CHAR_DEREF(STRING_ELT(path, 0)));
}
return(ans);
}
#endif
SEXP
R_matchNodesInList(SEXP r_nodes, SEXP r_target, SEXP r_nomatch)
{
xmlNodePtr el;
int i, j, n, n2;
SEXP ans;
n = GET_LENGTH(r_nodes);
n2 = GET_LENGTH(r_target);
ans = NEW_INTEGER( n );
for(i = 0; i < n ; i++) {
el = R_ExternalPtrAddr(VECTOR_ELT(r_nodes, i));
INTEGER(ans)[i] = INTEGER(r_nomatch)[0];
for(j = 0; j < n2; j++) {
if(el == R_ExternalPtrAddr(VECTOR_ELT(r_target, j))) {
INTEGER(ans)[i] = j;
break;
}
}
}
return(ans);
}
#if 0 /* taken from Sxslt and should be left there or moved here.*/
USER_OBJECT_
RXSLT_export_xmlXPathObject(xmlNodeSetPtr val, const char * className)
{
USER_OBJECT_ ans;
USER_OBJECT_ klass;
PROTECT(klass = MAKE_CLASS(className));
PROTECT(ans = NEW(klass));
SET_SLOT(ans, Rf_install("ref"), R_MakeExternalPtr(val, Rf_install(className), R_NilValue));
UNPROTECT(2);
return(ans);
}
USER_OBJECT_
R_xmlXPathNewNodeSet(USER_OBJECT_ s_node)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(s_node);
xmlXPathObjectPtr nodeset;
nodeset = xmlXPathNewNodeSet(node);
return(RXSLT_export_xmlXPathObject(nodeset->nodesetval, "XPathNodeSet"));
}
#endif
#include
/*
XXX Does not handle unicode in any way. Very simple-minded for now.
*/
void
xpathTolower(xmlXPathParserContextPtr ctxt, int nargs)
{
if(nargs == 0)
return;
xmlXPathObjectPtr obj = valuePop(ctxt);
if (obj->type != XPATH_STRING) {
valuePush(ctxt, obj);
xmlXPathStringFunction(ctxt, 1);
obj = valuePop(ctxt);
}
xmlChar *str = xmlStrdup(obj->stringval);
// xmlChar *ptr = str;
int i, n = xmlStrlen(str);
for(i = 0; i < n ; i++)
str[i] = (xmlChar)tolower(str[i]);
valuePush(ctxt, xmlXPathNewString(str));
}
void
xpathEndswith(xmlXPathParserContextPtr ctxt, int nargs)
{
if(nargs < 2)
return;
xmlChar *pattern = xmlXPathPopString(ctxt);
xmlChar *input = xmlXPathPopString(ctxt);
int i, n = xmlStrlen(input), lenPattern = xmlStrlen(pattern);
if(n < lenPattern)
xmlXPathReturnBoolean(ctxt, 0);
xmlChar *ptr = input + n - lenPattern;
for(i = 0; i < lenPattern ; i++) {
if(ptr[i] != pattern[i])
break;
}
xmlXPathReturnBoolean(ctxt, i == lenPattern);
}
void
xpathAbs(xmlXPathParserContextPtr ctxt, int nargs)
{
if(nargs < 1)
return;
double num = xmlXPathPopNumber(ctxt);
xmlXPathReturnNumber(ctxt, num < 0 ? - num : num);
}
void
xpathBaseURI(xmlXPathParserContextPtr ctxt, int nargs)
{
xmlDocPtr doc;
if(nargs == 0) {
doc = ctxt->context->doc;
} else {
xmlXPathObjectPtr obj = valuePop(ctxt);
if(obj->type != XPATH_NODESET)
return;
xmlNodePtr node = obj->nodesetval->nodeTab[0];
doc = node->doc;
}
xmlXPathReturnString(ctxt, xmlStrdup((doc && doc->URL) ?
doc->URL : (const xmlChar *)""));
}
#define MIN(x, y) (x) < (y) ? (x) : (y)
#define MAX(x, y) (x) > (y) ? (x) : (y)
void
xpathExtreme(xmlXPathParserContextPtr ctxt, int nargs, int isMax)
{
if(nargs < 1)
return;
double ans = 0;
int set = 0;
int i, a;
xmlXPathObjectPtr obj;
double tmp = 0.0;
for(a = 0; a < nargs; a++) {
obj = valuePop(ctxt);
if(obj->type == XPATH_NODESET) {
for(i = 0; i < obj->nodesetval->nodeNr; i++) {
tmp = xmlXPathCastNodeToNumber(obj->nodesetval->nodeTab[i]);
if(set)
ans = isMax ? MAX(tmp, ans) : MIN(tmp, ans);
else {
ans = tmp;
set = 1;
}
}
} else if(obj->type == XPATH_NUMBER) {
if(set)
ans = isMax ? MAX(tmp, ans) : MIN(tmp, ans);
else {
ans = tmp;
set = 1;
}
}
xmlXPathFreeObject(obj);
}
xmlXPathReturnNumber(ctxt, ans);
}
void
xpathMin(xmlXPathParserContextPtr ctxt, int nargs)
{
xpathExtreme(ctxt, nargs, 0);
}
void
xpathMax(xmlXPathParserContextPtr ctxt, int nargs)
{
xpathExtreme(ctxt, nargs, 1);
}
void
xpathGrepl(xmlXPathParserContextPtr ctxt, int nargs)
{
if(nargs < 2)
return;
xmlChar *pattern = xmlXPathPopString(ctxt);
xmlChar *input = xmlXPathPopString(ctxt);
SEXP e = Rf_allocVector(LANGSXP, 3);
PROTECT(e);
SETCAR(e, Rf_install("grepl"));
SETCAR(CDR(e), ScalarString(mkChar((const char *)pattern)));
SETCAR(CDR(CDR(e)), ScalarString(mkChar((const char *)input)));
SEXP ans = Rf_eval(e, R_GlobalEnv);
xmlXPathReturnBoolean(ctxt, INTEGER(ans)[0]);
UNPROTECT(1);
}
void
xpathReplace(xmlXPathParserContextPtr ctxt, int nargs)
{
if(nargs < 3)
return;
xmlChar *replacement = xmlXPathPopString(ctxt);
xmlChar *pattern = xmlXPathPopString(ctxt);
xmlChar *input = xmlXPathPopString(ctxt);
SEXP e = Rf_allocVector(LANGSXP, 4);
PROTECT(e);
SEXP cur = e;
SETCAR(e, Rf_install("gsub")); cur = CDR(cur);
SETCAR(cur, ScalarString(mkChar((const char *)pattern))); cur = CDR(cur);
SETCAR(cur, ScalarString(mkChar((const char *)replacement))); cur = CDR(cur);
SETCAR(cur, ScalarString(mkChar((const char *)input)));
SEXP ans = Rf_eval(e, R_GlobalEnv);
xmlXPathReturnString(ctxt, xmlStrdup((const xmlChar *)CHAR(STRING_ELT(ans, 0))));
UNPROTECT(1);
}
SEXP
convertXPathVal(xmlXPathObjectPtr xval)
{
SEXP ans = R_NilValue;
switch(xval->type) {
case XPATH_BOOLEAN:
ans = ScalarLogical(xval->boolval);
break;
case XPATH_NUMBER:
ans = ScalarReal(xval->floatval);
break;
case XPATH_STRING:
ans = ScalarString(mkChar((const char *)xval->stringval));
break;
case XPATH_NODESET:
{
SEXP tmp = ScalarLogical(FALSE);
PROTECT(tmp);
ans = convertXPathObjectToR(xval, R_NilValue, 0, tmp);
UNPROTECT(1);
}
break;
default:
Rf_warning("converting an XPath type %d to R not supported now", xval->type);
}
return(ans);
}
void
R_pushResult(xmlXPathParserContextPtr ctxt, SEXP ans)
{
switch(TYPEOF(ans)) {
case LGLSXP:
xmlXPathReturnBoolean(ctxt, INTEGER(ans)[0]);
break;
case INTSXP:
xmlXPathReturnNumber(ctxt, INTEGER(ans)[0]);
break;
case REALSXP:
xmlXPathReturnNumber(ctxt, REAL(ans)[0]);
break;
case STRSXP:
xmlXPathReturnString(ctxt, xmlStrdup((const xmlChar *)CHAR(STRING_ELT(ans, 0))));
break;
default:
Rf_error("R type not supported as result of XPath function");
}
}
void
R_callGenericXPathFun(xmlXPathParserContextPtr ctxt, int nargs, SEXP fun)
{
SEXP e = Rf_allocVector(LANGSXP, nargs + 1);
PROTECT(e);
SETCAR(e, fun);
SEXP cur = CDR(e);
xmlXPathObjectPtr tmp;
int j;
for(int i = nargs ; i > 0; i--) {
/* The arguments are on the stack with the last on the top, second to last next, and so on.
So we have to add them to our expression starting at the end. */
cur = e;
for(j = 0 ; j < i; j++) cur = CDR(cur);
tmp = valuePop(ctxt);
SETCAR(cur, convertXPathVal(tmp));
xmlXPathFreeObject(tmp);
}
SEXP ans = Rf_eval(e, R_GlobalEnv);
PROTECT(ans);
R_pushResult(ctxt, ans);
UNPROTECT(2);
}
void
R_genericAnonXPathFun(xmlXPathParserContextPtr ctxt, int nargs)
{
if(!R_AnonXPathFuns || R_AnonXPathFuns == R_NilValue)
return;
int i, n = Rf_length(R_AnonXPathFuns);
SEXP names = GET_NAMES(R_AnonXPathFuns);
for(i = 0; i < n; i++) {
if(strcmp((const char *)ctxt->context->function, CHAR(STRING_ELT(names, i))) == 0) {
R_callGenericXPathFun(ctxt, nargs, VECTOR_ELT(R_AnonXPathFuns, i));
return;
}
}
}
void
R_genericXPathFun(xmlXPathParserContextPtr ctxt, int nargs)
{
SEXP f = Rf_install((const char *)ctxt->context->function);
PROTECT(f);
R_callGenericXPathFun(ctxt, nargs, f);
UNPROTECT(1);
}
XML/src/EventParse.h 0000644 0001751 0000144 00000012343 13610555146 013755 0 ustar hornik users /*
* See Copyright for the license status of this software.
*/
#ifndef EVENT_PARSE_H
#define EVENT_PARSE_H
#include
#include
#include "RSCommon.h"
#include "RS_XML.h"
#ifdef LIBEXPAT
#include "xmlparse.h"
#else
typedef char XML_Char;
#endif
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
/* Extensible Struct for carrying information about the parser and its
options as specified by the caller from R or S.
*/
typedef struct {
/**
The name of the source file which is being parsed.
*/
char *fileName;
/**
Flag indicating whether blank (white-space only) text entries should be discarded
and not reported.
*/
int ignoreBlankLines;
/**
Flag indicating whether the methods in the user-level functions
should be invoked with additional information about the current
context of the parser, specifically the level or depth of the current node in the
tree, potentially the index sequence (i_child1, i_child2, i_child3,...) which
identifies the node relative to the root of the tree.
Specify this in the call to xmlEventParse().
*/
int addContextInfo;
/* Flag indicating whether an attempt should be made when calling
startElement to lookup a method matching the tag name rather than
the vanilla startElement method.
Set this in the call xmlEventParse().
*/
int callByTagName;
/* The R object in which to search for appropriate methods, usually a closure. */
USER_OBJECT_ methods;
USER_OBJECT_ endElementHandlers;
/*
The current depth in the XML document tree.
Used when constructing
*/
int depth;
/*
Flag indicating whether we should trim the
*/
int trim;
/* S object used in event parsing to share state across calls. */
USER_OBJECT_ stateObject;
/* For identifying which element names are to be created into regular nodes. */
USER_OBJECT_ branches;
xmlNodePtr current;
xmlNodePtr top;
int branchIndex;
/* */
int useDotNames;
/* The XML context */
xmlParserCtxtPtr ctx;
/* A function which is used to process a branch "anonymously, i.e
not one that is actively identified in the branches = list(....)
but a function that is returned from a regular startElement handler
that indicates collect up the node and call this.
*/
USER_OBJECT_ dynamicBranchFunction;
USER_OBJECT_ finalize;
} RS_XMLParserData;
/* The name of the R element to call fo the general case.
if useDotNames is on, then we paste a . to the regular name.
*/
#define HANDLER_FUN_NAME(ctx, txt) \
((RS_XMLParserData *)(ctx))->useDotNames ? "." txt : txt
void
R_processBranch(RS_XMLParserData * rinfo,
int branchIndex,
const xmlChar * localname,
const xmlChar * prefix,
const xmlChar * URI,
int nb_namespaces,
const xmlChar ** namespaces,
int nb_attributes,
int nb_defaulted,
const xmlChar ** attributes,
Rboolean sax1);
int
R_isBranch(const xmlChar *localname, RS_XMLParserData *rinfo);
void
R_endBranch(RS_XMLParserData *rinfo,
const xmlChar * localname,
const xmlChar * prefix,
const xmlChar * URI);
#if 0
typedef struct NodeList NodeList;
struct NodeList {
xmlNodePtr *el;
NodeList *next
};
#endif
/* Allocate a data structure for use with the parser */
RS_XMLParserData *createRSXMLParserData(USER_OBJECT_ handlers) ;
USER_OBJECT_ RS_XML(callUserFunction)(const char *opName, const char *preferredName, RS_XMLParserData *parser, USER_OBJECT_ opArgs) ;
/*Made static now: USER_OBJECT_ RS_XML(createAttributesList)(const char **atts); */
void RS_XML(entityDeclarationHandler)(void *userData, const XML_Char *entityName,
const XML_Char *base, const XML_Char *systemId,
const XML_Char *publicId, const XML_Char *notationName);
void RS_XML(entityDeclarationHandler)(void *userData, const XML_Char *entityName,
const XML_Char *base, const XML_Char *systemId,
const XML_Char *publicId, const XML_Char *notationName);
void RS_XML(commentHandler)(void *userData, const XML_Char *data);
void RS_XML(endElement)(void *userData, const char *name);
void RS_XML(startElement)(void *userData, const char *name, const char **atts);
void RS_XML(processingInstructionHandler)(void *userData, const XML_Char *target, const XML_Char *data);
void RS_XML(textHandler)(void *userData, const XML_Char *s, int len);
void RS_XML(startCdataSectionHandler)(void *userData) ;
void RS_XML(endCdataSectionHandler)(void *userData) ;
RS_XMLParserData *RS_XML(createParserData)(USER_OBJECT_ handlers, USER_OBJECT_ finalize);
int RS_XML(parseBufferWithParserData)(char *buf, RS_XMLParserData *parserData);
int RS_XML(notStandAloneHandler)(void *userData);
int RS_XML(libXMLEventParse)(const char *fileName, RS_XMLParserData *parserData, RS_XML_ContentSourceType asText,
int saxVersion, USER_OBJECT_ r_encoding);
USER_OBJECT_ findEndElementFun(const char *name, RS_XMLParserData *rinfo);
void updateState(USER_OBJECT_ val, RS_XMLParserData *parserData);
#endif
XML/src/xmlsecurity.c 0000644 0001751 0000144 00000001237 13607643022 014261 0 ustar hornik users #ifdef HAVE_LIBXMLSEC
#include
#include
#include
#include "Rinternals.h"
void
R_xmlSecCryptoInit(int *els)
{
int status;
els[0] = status = xmlSecCryptoInit();
if(status != 0)
return;
els[1] = status = xmlSecCryptoAppInit(NULL);
if(status != 0)
return;
els[2] = status = xmlSecCryptoInit();
}
SEXP
R_xmlSecCryptoShutdown()
{
int status;
status = xmlSecCryptoShutdown();
return(ScalarInteger(status));
}
#else
// avoid a warning about an empty translation unit.
// instead, this gives one about an unused variable!
// static int foo;
void R_xmlSecCryptoInit(int *els) {}
#endif
XML/src/Makevars.ucrt 0000644 0001751 0000144 00000000571 14023353273 014174 0 ustar hornik users PKG_CPPFLAGS= -I${LOCAL_SOFT}/include/libxml2 -I${LIB_XML}/include -D_R_=1 -DUSE_R=1 -DUSE_XML_VERSION_H=1 -DLIBXML -DUSE_EXTERNAL_SUBSET=1 -DROOT_HAS_DTD_NODE=1 -DUMP_WITH_ENCODING=1 -DXML_ELEMENT_ETYPE=1 -DXML_ATTRIBUTE_ATYPE=1 -DLIBXML2=1 -DHAVE_XML_HAS_FEATURE -DLIBXML_STATIC -DNO_XML_HASH_SCANNER_RETURN=1
PKG_LIBS = -L${LIB_XML}/lib -lxml2 -liconv -lz -llzma -lws2_32
XML/src/HTMLParse.c 0000644 0001751 0000144 00000007746 14106741723 013445 0 ustar hornik users /*
This file uses the HTML parser in libxml to provide an HTML
parser in R that is basically identical to the XML parsing interface.
It can handle files, URLs, compressed files, and raw HTML text.
It drops the DTD and validation options since these are not very relevant
for HTML. (We can add put them back if anyone wants!)
*/
#include "DocParse.h"
#include "Utils.h"
#include "libxml/HTMLparser.h"
#include "libxml/HTMLtree.h"
#include
#include
USER_OBJECT_
RS_XML(HtmlParseTree)(USER_OBJECT_ fileName, USER_OBJECT_ converterFunctions,
USER_OBJECT_ skipBlankLines, USER_OBJECT_ replaceEntities,
USER_OBJECT_ asText, USER_OBJECT_ trim, USER_OBJECT_ isURL)
{
const char *name;
xmlDocPtr doc;
USER_OBJECT_ rdoc;
USER_OBJECT_ className;
R_XMLSettings parserSettings;
int freeName = 0;
int asTextBuffer = LOGICAL_DATA(asText)[0];
int isURLDoc = LOGICAL_DATA(isURL)[0];
parserSettings.skipBlankLines = LOGICAL_DATA(skipBlankLines)[0];
parserSettings.converters = converterFunctions;
parserSettings.trim = LOGICAL_DATA(trim)[0];
if(asTextBuffer == 0) {
struct stat tmp_stat;
#ifdef USE_R
name = CHAR(STRING_ELT(fileName, 0));
#else
name = CHARACTER_DATA(fileName)[0];
#endif
if(!isURLDoc && (name == NULL || stat(name, &tmp_stat) < 0)) {
Rf_error("Can't find file %s", CHAR_DEREF(STRING_ELT(fileName, 0)) );
}
} else {
name = strdup(CHAR_DEREF(STRING_ELT(fileName, 0)));
freeName = 1;
}
#if 0
/* If one wants entities expanded directly and to appear as text. */
if(LOGICAL_DATA(replaceEntities)[0])
xmlSubstituteEntitiesDefault(1);
#endif
if(asTextBuffer) {
doc = htmlParseDoc(CHAR_TO_XMLCHAR(name), NULL);
if(doc != NULL) {
doc->name = (char *) xmlStrdup(CHAR_TO_XMLCHAR(""));
}
} else {
doc = htmlParseFile(name, NULL);
}
if(doc == NULL) {
if(freeName && name)
free((char *) name);
Rf_error("error in creating parser for %s", name);
}
PROTECT(rdoc = RS_XML(convertXMLDoc)(name, doc, converterFunctions, &parserSettings));
if(freeName && name)
free((char *) name);
#if 0
xmlFreeDoc(doc);
R_numXMLDocsFreed++;
#endif
/* Set the class for the document. */
className = NEW_CHARACTER(1);
PROTECT(className);
SET_STRING_ELT(className, 0, mkChar("HTMLDocument"));
SET_CLASS(rdoc, className);
UNPROTECT(1);
UNPROTECT(1);
return(rdoc);
}
/*
Copied from RS_XML_printXMLNode (XMLTree.c) with minor changes.
*/
USER_OBJECT_
RS_XML_dumpHTMLDoc(USER_OBJECT_ r_node, USER_OBJECT_ format, USER_OBJECT_ r_encoding, USER_OBJECT_ indent, USER_OBJECT_ outFile)
{
USER_OBJECT_ ans;
xmlDocPtr node;
const char *encoding = NULL;
xmlOutputBufferPtr buf;
xmlBufferPtr xbuf;
int oldIndent;
oldIndent = xmlIndentTreeOutput;
node = (xmlDocPtr) R_ExternalPtrAddr(r_node);
xmlIndentTreeOutput = LOGICAL(indent)[0];
#if ADD_XML_OUTPUT_BUFFER_CODE
if(Rf_length(outFile)) {
htmlSaveFile(CHAR_DEREF(STRING_ELT(outFile, 0)), node);
return(R_NilValue);
}
#endif
if(GET_LENGTH(r_encoding))
encoding = CHAR_DEREF(STRING_ELT(r_encoding, 0));
xbuf = xmlBufferCreate();
#if 1
buf = xmlOutputBufferCreateBuffer(xbuf, NULL);
#else
buf = xmlOutputBufferCreateFilename("/tmp/test.out", NULL, 0);
#endif
htmlDocContentDumpFormatOutput(buf, node, encoding, INTEGER(format)[0]);
xmlOutputBufferFlush(buf);
xmlIndentTreeOutput = oldIndent;
if(xbuf->use > 0) {
/*XXX this const char * in CHARSXP means we have to make multiple copies. */
#if 0
char *rbuf = R_alloc(sizeof(char) * (xbuf->use + 1));
memcpy(rbuf, xbuf->content, xbuf->use + 1);
PROTECT(tmp = mkChar(rbuf));
#endif
// ans = ScalarString(mkChar(xbuf->content));
DECL_ENCODING_FROM_DOC(node)
ans = ScalarString(ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(xbuf->content)));
} else
ans = NEW_CHARACTER(1);
xmlOutputBufferClose(buf);
return(ans);
}
XML/src/DocParse.c 0000644 0001751 0000144 00000124025 14552751546 013405 0 ustar hornik users /**
Routines for parsing and processing an XML document
into an R/S data structure.
* See Copyright for the license status of this software.
*/
#include "DocParse.h"
#define R_USE_XML_ENCODING 1
#include "Utils.h" /* For isBlank() */
/* For the call to stat. */
#include
#include
#include "RSDTD.h"
#include
#include
#include
int RS_XML(setNodeClass)(xmlNodePtr node, USER_OBJECT_ ans);
USER_OBJECT_ RS_XML(notifyNamespaceDefinition)(USER_OBJECT_ ns, R_XMLSettings *parserSettings);
void RS_XML(ValidationWarning)(void *ctx, const char *msg, ...);
void RS_XML(ValidationError)(void *ctx, const char *msg, ...);
static USER_OBJECT_ convertNode(USER_OBJECT_ ans, xmlNodePtr node, R_XMLSettings *parserSettings);
static void NodeTraverse(xmlNodePtr doc, USER_OBJECT_ converterFunctions, R_XMLSettings *parserSettings, int rootFirst);
static USER_OBJECT_ makeSchemaReference(xmlSchemaPtr ref);
USER_OBJECT_
RS_XML(libxmlVersionRuntime)(void)
{
return(mkString(
#if LIBXML_VERSION < 21200
*__xmlParserVersion()
#else
xmlParserVersion
#endif
));
}
USER_OBJECT_
RS_XML(getDefaultValiditySetting)(USER_OBJECT_ val)
{
#ifdef HAVE_VALIDITY
// extern int xmlDoValidityCheckingDefaultValue;
USER_OBJECT_ ans;
ans = NEW_INTEGER(1);
INTEGER_DATA(ans)[0] = xmlDoValidityCheckingDefaultValue;
if(GET_LENGTH(val))
xmlDoValidityCheckingDefaultValue = INTEGER_DATA(val)[0];
return(ans);
#else
return(NEW_INTEGER(0));
#endif
}
#include
void
R_xmlStructuredErrorHandler(void *data, xmlErrorPtr err)
{
RSXML_structuredStop((SEXP) data, err);
}
/**
Entry point for reading, parsing and converting an XML tree
to an R object.
fileName is the string identifying the file, and is
expanded using the normal rules for an R file name.
That is, it can contain environment variables, ~, etc.
converterFunctions is a collection of functions used to
map a node into an R object. This would normally
be a closure. It is not currently used, but will be enabled in
the future.
skipBlankLines controls whether text elements consisting
simply of white space are included in the resulting
structure.
The return value is a simple list with named elements
file, version and children
The children element is itself a list consisting of
objects of class `XMLNode'. Each of these has the characteristic
*/
USER_OBJECT_
RS_XML(ParseTree)(USER_OBJECT_ fileName, USER_OBJECT_ converterFunctions,
USER_OBJECT_ skipBlankLines, USER_OBJECT_ replaceEntities,
USER_OBJECT_ asText, USER_OBJECT_ trim, USER_OBJECT_ validate,
USER_OBJECT_ getDTD, USER_OBJECT_ isURL,
USER_OBJECT_ addNamespaceAttributes,
USER_OBJECT_ internalNodeReferences,
USER_OBJECT_ s_useHTML, USER_OBJECT_ isSchema,
USER_OBJECT_ fullNamespaceInfo, USER_OBJECT_ r_encoding,
USER_OBJECT_ useDotNames,
USER_OBJECT_ xinclude, USER_OBJECT_ errorFun,
USER_OBJECT_ manageMemory, USER_OBJECT_ r_parserOptions,
USER_OBJECT_ r_rootFirst)
{
const char *name;
xmlDocPtr doc;
USER_OBJECT_ rdoc, rdocObj; /* rdocObj is used to put the doc object
* under R's garbage collection.*/
USER_OBJECT_ className;
R_XMLSettings parserSettings;
int asTextBuffer = LOGICAL_DATA(asText)[0];
int isURLDoc = LOGICAL_DATA(isURL)[0];
int useHTML = LOGICAL_DATA(s_useHTML)[0];
const char *encoding = NULL;
int freeName = 0;
int parserOptions = 0;
int rootFirst = INTEGER(r_rootFirst)[0];
if(GET_LENGTH(r_encoding)) {
encoding = CHAR(STRING_ELT(r_encoding, 0));
if(!encoding[0])
encoding = NULL;
}
if(Rf_length(r_parserOptions))
parserOptions = INTEGER(r_parserOptions)[0];
parserSettings.skipBlankLines = LOGICAL_DATA(skipBlankLines)[0];
parserSettings.converters = converterFunctions;
parserSettings.useDotNames = LOGICAL_DATA(useDotNames)[0];
parserSettings.trim = LOGICAL_DATA(trim)[0];
parserSettings.xinclude = LOGICAL_DATA(xinclude)[0];
parserSettings.fullNamespaceInfo = LOGICAL_DATA(fullNamespaceInfo)[0];
parserSettings.internalNodeReferences = LOGICAL_DATA(internalNodeReferences)[0];
parserSettings.addAttributeNamespaces = LOGICAL_DATA(addNamespaceAttributes)[0];
parserSettings.finalize = manageMemory;
if(asTextBuffer == 0) {
struct stat tmp_stat;
#ifdef USE_R
name = CHAR(STRING_ELT(fileName, 0));
#else
name = CHARACTER_DATA(fileName)[0];
#endif
if(!isURLDoc && (name == NULL || stat(name, &tmp_stat) < 0)) {
Rf_error("Can't find file %s", CHAR_DEREF(STRING_ELT(fileName, 0)) );
}
} else {
name = strdup(CHAR_DEREF(STRING_ELT(fileName, 0)));
freeName = 1;
}
#if 0 /* Done in R now.*/
/* If one wants entities expanded directly and to appear as text. */
if(LOGICAL_DATA(replaceEntities)[0])
xmlSubstituteEntitiesDefault(1);
#endif
if(LOGICAL_DATA(isSchema)[0]) {
xmlSchemaPtr schema = NULL;
xmlSchemaParserCtxtPtr ctxt;
ctxt = xmlSchemaNewParserCtxt(name);
schema = xmlSchemaParse(ctxt);
xmlSchemaFreeParserCtxt(ctxt);
/*XXX make certain to cleanup the settings.
Put a finalizer on this in makeSchemaReference.
*/
return(makeSchemaReference(schema));
}
#ifdef RS_XML_SET_STRUCTURED_ERROR
xmlSetStructuredErrorFunc(errorFun == NULL_USER_OBJECT ? NULL : errorFun, R_xmlStructuredErrorHandler);
#endif
if(asTextBuffer) {
doc = useHTML ? htmlParseDoc(CHAR_TO_XMLCHAR(name), encoding) :
xmlReadMemory(name, (int)strlen(name), NULL, encoding, parserOptions) ;
/* xmlParseMemory(name, strlen(name)) */
if(doc != NULL)
doc->name = (char *) xmlStrdup(CHAR_TO_XMLCHAR(""));
} else {
doc = useHTML ? htmlParseFile(XMLCHAR_TO_CHAR(name), encoding) :
xmlReadFile(name, encoding, parserOptions) /* xmlParseFile(name) */ ;
}
#ifdef RS_XML_SET_STRUCTURED_ERROR
xmlSetStructuredErrorFunc(NULL, NULL);
#endif
if(doc == NULL) {
if(freeName && name) {
#ifdef EXPERIMENTING
free((char *) name);
#endif
}
/*XXX Just freed the name ! */
if(errorFun != NULL_USER_OBJECT) {
RSXML_structuredStop(errorFun, NULL);
} else
return(stop("XMLParseError", "error in creating parser for %s", name));
Rf_error("error in creating parser for %s", name);
}
if(TYPEOF(xinclude) == LGLSXP && LOGICAL_DATA(xinclude)[0]) {
xmlXIncludeProcessFlags(doc, XML_PARSE_XINCLUDE);
} else if(TYPEOF(xinclude) == INTSXP && GET_LENGTH(xinclude) > 0) {
xmlXIncludeProcessFlags(doc, INTEGER(xinclude)[0]);
}
if(!useHTML && LOGICAL_DATA(validate)[0]) {
xmlValidCtxt ctxt;
ctxt.error = RS_XML(ValidationError);
ctxt.warning = RS_XML(ValidationWarning);
if(!xmlValidateDocument(&ctxt, doc)) {
if(freeName && name)
free((char *) name);
Rf_error("XML document is invalid");
}
}
if(parserSettings.internalNodeReferences) {
/* Use a different approach - pass internal nodes to the converter functions*/
if(GET_LENGTH(converterFunctions) > 0) {
xmlNodePtr root;
#ifdef USE_OLD_ROOT_CHILD_NAMES
root = doc->root;
#else
root = doc->xmlRootNode;
#ifdef ROOT_HAS_DTD_NODE
if(root->next && root->children == NULL)
root = root->next;
#endif
#endif
PROTECT(rdocObj = R_createXMLDocRef(doc));
NodeTraverse(root, converterFunctions, &parserSettings, rootFirst);
UNPROTECT(1);
}
PROTECT(rdoc = NULL_USER_OBJECT);
} else {
PROTECT(rdoc = RS_XML(convertXMLDoc)(name, doc, converterFunctions, &parserSettings));
}
if(asTextBuffer && name)
free((char *) name);
if(!useHTML && !parserSettings.internalNodeReferences && LOGICAL_DATA(getDTD)[0]) {
USER_OBJECT_ ans, klass, tmp;
const char *names[] = {"doc", "dtd"};
PROTECT(ans = NEW_LIST(2));
SET_VECTOR_ELT(ans, 0, rdoc);
SET_VECTOR_ELT(ans, 1, tmp = RS_XML(ConstructDTDList)(doc, 1, NULL));
PROTECT(klass = NEW_CHARACTER(1));
SET_STRING_ELT( klass, 0, mkChar("DTDList"));
SET_CLASS(tmp, klass);
RS_XML(SetNames)(sizeof(names)/sizeof(names[0]), names, ans);
UNPROTECT(2); /* release the ans */
rdoc = ans;
}
if(parserSettings.internalNodeReferences && GET_LENGTH(converterFunctions) < 1) {
UNPROTECT(1);
return(R_createXMLDocRef(doc));
}
if(!parserSettings.internalNodeReferences) {
/* Set the class for the document. */
className = NEW_CHARACTER(1);
PROTECT(className);
SET_STRING_ELT(className, 0, mkChar(useHTML ? "HTMLDocument" : "XMLDocument"));
SET_CLASS(rdoc, className);
UNPROTECT(1);
}
UNPROTECT(1);
return(rdoc);
}
enum { FILE_ELEMENT_NAME, VERSION_ELEMENT_NAME, CHILDREN_ELEMENT_NAME, NUM_DOC_ELEMENTS};
void
NodeTraverse(xmlNodePtr root, USER_OBJECT_ converterFunctions, R_XMLSettings *parserSettings, int rootFirst)
{
xmlNodePtr c, tmp;
c = root;
while(c) {
USER_OBJECT_ ref;
#ifndef USE_OLD_ROOT_CHILD_NAMES
tmp = c->xmlChildrenNode;
#else
c->childs;
#endif
if(!rootFirst && tmp)
NodeTraverse(tmp, converterFunctions, parserSettings, rootFirst);
PROTECT(ref = R_createXMLNodeRef(c, parserSettings->finalize));
convertNode(ref, c, parserSettings);
UNPROTECT(1);
if(rootFirst && tmp)
NodeTraverse(tmp, converterFunctions, parserSettings, rootFirst);
c = c->next;
}
}
/**
Returns a named list whose elements are
file: the name of the file being processed.
version: the XML version.
root: the collection of children.
*/
USER_OBJECT_
RS_XML(convertXMLDoc)(const char *fileName, xmlDocPtr doc, USER_OBJECT_ converterFunctions,
R_XMLSettings *parserSettings)
{
USER_OBJECT_ rdoc;
USER_OBJECT_ rdoc_el_names, klass;
int n = NUM_DOC_ELEMENTS;
const char *version = "";
DECL_ENCODING_FROM_DOC(doc)
PROTECT(rdoc = NEW_LIST(n));
PROTECT(rdoc_el_names = NEW_CHARACTER(n));
/* Insert the name of the file being processed */
SET_VECTOR_ELT(rdoc, FILE_ELEMENT_NAME, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(rdoc, FILE_ELEMENT_NAME), 0,
ENC_COPY_TO_USER_STRING(doc->name ? (const xmlChar*)doc->name : (const xmlChar*)fileName));
//SET_STRING_ELT(VECTOR_ELT(rdoc, FILE_ELEMENT_NAME), 0, ENC_COPY_TO_USER_STRING(doc->name ? XMLCHAR_TO_CHAR(doc->name) : fileName));
SET_STRING_ELT(rdoc_el_names, FILE_ELEMENT_NAME, COPY_TO_USER_STRING("file"));
/* Insert the XML version information */
SET_VECTOR_ELT(rdoc, VERSION_ELEMENT_NAME, NEW_CHARACTER(1));
if(doc->version)
version = XMLCHAR_TO_CHAR(doc->version);
SET_STRING_ELT(VECTOR_ELT(rdoc, VERSION_ELEMENT_NAME), 0,
COPY_TO_USER_STRING(version));
SET_STRING_ELT(rdoc_el_names, VERSION_ELEMENT_NAME, COPY_TO_USER_STRING("version"));
/* Compute the nodes for this tree, recursively.
Note the SIDEWAYS argument to get the sibling nodes
at the root, rather than just the first and its children.
*/
{
xmlNodePtr root;
#ifdef USE_OLD_ROOT_CHILD_NAMES
root = doc->root;
#else
root = doc->xmlRootNode;
#ifdef ROOT_HAS_DTD_NODE
if(root->next && root->children == NULL)
root = root->next;
#endif
#endif
SET_VECTOR_ELT(rdoc, CHILDREN_ELEMENT_NAME, RS_XML(createNodeChildren)(root, SIDEWAYS, parserSettings));
}
SET_STRING_ELT(rdoc_el_names, CHILDREN_ELEMENT_NAME, COPY_TO_USER_STRING("children"));
SET_NAMES(rdoc, rdoc_el_names);
PROTECT(klass = NEW_CHARACTER(1));
SET_STRING_ELT(klass, 0, COPY_TO_USER_STRING("XMLDocumentContent"));
SET_CLASS(rdoc, klass);
UNPROTECT(3);
return(rdoc);
}
USER_OBJECT_
processNamespaceDefinitions(xmlNs *ns, xmlNodePtr node, R_XMLSettings *parserSettings)
{
int n = 0;
xmlNs *ptr = ns;
USER_OBJECT_ ans, tmp, names;
DECL_ENCODING_FROM_NODE(node)
while(ptr) {
ptr = ptr->next;
n++;
}
PROTECT(ans = NEW_LIST(n));
PROTECT(names = NEW_CHARACTER(n));
for(n = 0, ptr = ns; ptr ; n++, ptr = ptr->next) {
// protection suggested by rchk
tmp = PROTECT(RS_XML(createNameSpaceIdentifier)(ptr,node));
(void) RS_XML(notifyNamespaceDefinition)(tmp, parserSettings);
SET_VECTOR_ELT(ans, n, tmp);
UNPROTECT(1);
if(ptr->prefix)
SET_STRING_ELT(names, n, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(ptr->prefix)));
}
SET_NAMES(ans, names);
SET_CLASS(ans, mkString("XMLNamespaceDefinitions"));
UNPROTECT(2);
return(ans);
}
/**
Creates an R object representing the specified node, and its children
if recursive is non-zero. Certain types of nodes have
direction controls whether we take the siblings of this node
or alternatively its children.
parentUserNode the previously created user-leve node for the parent of the
target node.
*/
enum { NODE_NAME, NODE_ATTRIBUTES, NODE_CHILDREN, NODE_NAMESPACE, NODE_NAMESPACE_DEFS, NUM_NODE_ELEMENTS};
USER_OBJECT_
getNamespaceDefs(xmlNodePtr node, int recursive)
{
USER_OBJECT_ nsDef = NULL_USER_OBJECT;
if(node->nsDef || recursive) {
int numProtects = 0;
xmlNs *ptr = node->nsDef;
int n = 0;
while(ptr) {
n++; ptr = ptr->next;
}
PROTECT(nsDef = NEW_LIST(n)); numProtects++;
ptr = node->nsDef; n = 0;
while(ptr) {
SET_VECTOR_ELT(nsDef, n, RS_XML(createNameSpaceIdentifier)(ptr, node));
n++; ptr = ptr->next;
}
if(recursive && node->children) {
xmlNodePtr ptr = node->children;
USER_OBJECT_ tmp;
int i;
PROTECT(nsDef); numProtects++;
while(ptr) {
PROTECT(tmp = getNamespaceDefs(ptr, 1));
/* nsDef = Rf_appendList(nsDef, tmp); */
if(Rf_length(tmp)) {
n = Rf_length(nsDef);
PROTECT(SET_LENGTH(nsDef, n + Rf_length(tmp)));
for(i = 0; i < Rf_length(tmp); i++)
SET_VECTOR_ELT(nsDef, n + i, VECTOR_ELT(tmp, i));
UNPROTECT(3); /* old nsDef, tmp, new nsDef */
PROTECT(nsDef);
} else
UNPROTECT(1); /* tmp */
ptr = ptr->next;
}
}
SET_CLASS(nsDef, mkString("NamespaceDefinitionList"));
UNPROTECT(numProtects);
}
return(nsDef);
}
USER_OBJECT_
RS_XML(internalNodeNamespaceDefinitions)(USER_OBJECT_ r_node, USER_OBJECT_ recursive)
{
xmlNodePtr node;
if(TYPEOF(r_node) != EXTPTRSXP) {
Rf_error("R_internalNodeNamespaceDefinitions expects InternalXMLNode objects");
}
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
return(getNamespaceDefs(node, LOGICAL(recursive)[0]));
}
static USER_OBJECT_
RS_XML(createXMLNode)(xmlNodePtr node, int recursive, int direction, R_XMLSettings *parserSettings, USER_OBJECT_ parentUserNode)
{
int n = NUM_NODE_ELEMENTS;
USER_OBJECT_ ans;
USER_OBJECT_ ans_el_names;
USER_OBJECT_ nsDef = NULL_USER_OBJECT;
int addValue;
DECL_ENCODING_FROM_NODE(node)
char *contentValue = XMLCHAR_TO_CHAR(node->content);
#ifdef ROOT_HAS_DTD_NODE
if(node->type == XML_DTD_NODE)
return(NULL);
#endif
if(parserSettings->trim) {
contentValue = trim(XMLCHAR_TO_CHAR(node->content));
}
addValue = (contentValue && strlen(contentValue) && isBlank(contentValue) == 0);
#ifdef LIBXML2
if(node->type == XML_ENTITY_DECL)
return(NULL);
#endif
/* Drop text nodes that are blank, if that is what the user wanted. */
if(parserSettings->skipBlankLines && addValue == 0 && node->type == XML_TEXT_NODE)
return(NULL);
if(addValue)
n++;
/* If we have a */
if(node->type != XML_ELEMENT_DECL) {
/* Create the default return value being a list of name, attributes, children
and possibly value.
*/
PROTECT(ans = NEW_LIST(n));
PROTECT(ans_el_names = NEW_CHARACTER(n));
/* If there are namespace definitions within this node, */
if(node->nsDef) {
nsDef = processNamespaceDefinitions(node->nsDef, node, parserSettings);
SET_VECTOR_ELT(ans, NODE_NAMESPACE_DEFS, nsDef);
}
SET_VECTOR_ELT(ans, NODE_NAME, NEW_CHARACTER(1));
if(node->name)
SET_STRING_ELT(VECTOR_ELT(ans, NODE_NAME), 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(node->name)));
SET_VECTOR_ELT(ans, NODE_ATTRIBUTES, RS_XML(AttributeList)(node, parserSettings));
if(recursive)
SET_VECTOR_ELT(ans, NODE_CHILDREN, RS_XML(createNodeChildren)(node, direction, parserSettings));
else
SET_VECTOR_ELT(ans, NODE_CHILDREN, NULL_USER_OBJECT);
SET_STRING_ELT(ans_el_names, NODE_NAME, mkChar("name"));
SET_STRING_ELT(ans_el_names, NODE_ATTRIBUTES, mkChar("attributes"));
SET_STRING_ELT(ans_el_names, NODE_CHILDREN, mkChar("children"));
SET_STRING_ELT(ans_el_names, NODE_NAMESPACE, mkChar("namespace"));
SET_STRING_ELT(ans_el_names, NODE_NAMESPACE_DEFS, mkChar("namespaceDefinitions"));
if(node->ns) {
PROTECT(nsDef = NEW_CHARACTER(1));
if(!parserSettings->fullNamespaceInfo) {
if(node->ns->prefix) {
SET_STRING_ELT(nsDef, 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(node->ns->prefix)));
SET_CLASS(nsDef, mkString("XMLNamespacePrefix"));
}
} else {
if(node->ns->href)
SET_STRING_ELT(nsDef, 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(node->ns->href)));
if(node->ns->prefix)
SET_NAMES(nsDef, ScalarString(ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(node->ns->prefix)))); /* XXX change! */
SET_CLASS(nsDef, mkString("XMLNamespace"));
}
SET_VECTOR_ELT(ans, NODE_NAMESPACE, nsDef);
UNPROTECT(1);
}
if(addValue) {
SET_STRING_ELT(ans_el_names, NUM_NODE_ELEMENTS, COPY_TO_USER_STRING("value"));
SET_VECTOR_ELT(ans, NUM_NODE_ELEMENTS, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, NUM_NODE_ELEMENTS), 0, ENC_COPY_TO_USER_STRING(contentValue));
if(node->type == XML_ENTITY_REF_NODE)
SET_NAMES(VECTOR_ELT(ans, NUM_NODE_ELEMENTS), ScalarString(ENC_COPY_TO_USER_STRING(node->name)));
}
SET_NAMES(ans, ans_el_names);
/* Compute the class of this object based on the type in the
XML node.
*/
RS_XML(setNodeClass)(node, ans);
} else {
/* XML_ELEMENT_DECL */
ans = NULL_USER_OBJECT;
PROTECT(ans);
PROTECT(ans);
}
/* Now invoke any user-level converters. */
if(recursive || direction)
ans = convertNode(ans, node, parserSettings);
UNPROTECT(1);
UNPROTECT(1);
return(ans);
}
static USER_OBJECT_
convertNode(USER_OBJECT_ ans, xmlNodePtr node, R_XMLSettings *parserSettings)
{
USER_OBJECT_ val = ans;
if(parserSettings != NULL) {
USER_OBJECT_ fun = NULL;
const char *funName;
if(parserSettings->xinclude && (node->type == XML_XINCLUDE_START || node->type == XML_XINCLUDE_END)) {
return(NULL);
}
if(node->name) {
funName = XMLCHAR_TO_CHAR(node->name);
fun = RS_XML(findFunction)(funName, parserSettings->converters);
}
if(fun == NULL) {
/* Didn't find the tag-specific function in the handlers.
So see if there is one for this type node.
*/
fun = RS_XML(lookupGenericNodeConverter)(node, ans, parserSettings);
}
if(fun != NULL) {
USER_OBJECT_ opArgs = NEW_LIST(1);
PROTECT(opArgs);
SET_VECTOR_ELT(opArgs, 0, ans);
val = RS_XML(invokeFunction)(fun, opArgs, NULL, NULL);
UNPROTECT(1);
}
}
return(val);
}
const char * const XMLNodeClassHierarchy[] = {"XMLNode", "RXMLAbstractNode", "XMLAbstractNode", "oldClass"};
int
RS_XML(setNodeClass)(xmlNodePtr node, USER_OBJECT_ ans)
{
char *className = NULL;
int numEls = 1;
int lenHier = sizeof(XMLNodeClassHierarchy)/sizeof(XMLNodeClassHierarchy[0]);
numEls = lenHier + 1;
switch(node->type) {
case XML_ENTITY_REF_NODE:
className = "XMLEntityRef";
break;
case XML_PI_NODE:
className = "XMLProcessingInstruction";
break;
case XML_COMMENT_NODE:
className = "XMLCommentNode";
break;
case XML_TEXT_NODE:
className = "XMLTextNode";
break;
case XML_CDATA_SECTION_NODE:
className = "XMLCDataNode";
break;
#ifdef LIBXML2
case XML_ENTITY_DECL:
className = "XMLEntityDeclaration";
break;
#endif
default:
numEls--;
break;
}
if(1) {
USER_OBJECT_ Class;
int ctr = 0, i;
PROTECT(Class = NEW_CHARACTER(numEls));
if(className)
SET_STRING_ELT(Class, ctr++, mkChar(className));
for(i = 0; i < lenHier; i++)
SET_STRING_ELT(Class, ctr++, mkChar(XMLNodeClassHierarchy[i]));
SET_CLASS(ans, Class);
UNPROTECT(1);
}
return(node->type);
}
const char *RS_XML(NameSpaceSlotNames)[] = {"id", "uri", "local"};
enum {NAMESPACE_PREFIX_SLOT, NAMESPACE_URI_SLOT, NAMESPACE_TYPE_SLOT, NAMESPACE_NUM_SLOTS};
/**
Create a local object identifying the name space used by a particular node.
This is not the name space definition which would have a URL/URI and a type.
*/
USER_OBJECT_
RS_XML(createNameSpaceIdentifier)(xmlNs *space, xmlNodePtr node)
{
USER_OBJECT_ ans;
DECL_ENCODING_FROM_NODE(node)
if(node->nsDef) {
PROTECT(ans = NEW_LIST(3));
SET_VECTOR_ELT(ans, NAMESPACE_PREFIX_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, NAMESPACE_PREFIX_SLOT), 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR( (space->prefix ? space->prefix : (xmlChar*)""))));
SET_VECTOR_ELT(ans, NAMESPACE_URI_SLOT, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(ans, NAMESPACE_URI_SLOT), 0, space->href ? ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(space->href)) : NA_STRING);
SET_VECTOR_ELT(ans, NAMESPACE_TYPE_SLOT, NEW_LOGICAL(1));
LOGICAL_DATA(VECTOR_ELT(ans, NAMESPACE_TYPE_SLOT))[0] = (space->type == XML_LOCAL_NAMESPACE);
RS_XML(SetNames)(NAMESPACE_NUM_SLOTS, RS_XML(NameSpaceSlotNames), ans);
{
USER_OBJECT_ klass;
PROTECT(klass = NEW_CHARACTER(1));
SET_STRING_ELT(klass, 0, COPY_TO_USER_STRING("XMLNamespaceDefinition"));
SET_CLASS(ans, klass);
UNPROTECT(1);
}
UNPROTECT(1);
} else {
PROTECT(ans = NEW_CHARACTER(1));
if(space->prefix)
SET_STRING_ELT(ans, 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(space->prefix)));
UNPROTECT(1);
}
return(ans);
}
/**
Attempt to find a function in the handler methods corresponding to the
type of the node, not its specific tag name.
*/
USER_OBJECT_
RS_XML(lookupGenericNodeConverter)(xmlNodePtr node, USER_OBJECT_ defaultNodeValue,
R_XMLSettings *parserSettings)
{
#define DOT(x) parserSettings->useDotNames ? "." x : x
char *name;
USER_OBJECT_ fun = NULL;
switch(node->type) {
case XML_ENTITY_REF_NODE:
name = DOT("entity");
break;
case XML_ENTITY_NODE:
name = DOT("entity");
break;
case XML_ELEMENT_NODE:
name = DOT("startElement");
break;
case XML_PI_NODE:
name = DOT("proccesingInstruction");
break;
case XML_COMMENT_NODE:
name = DOT("comment");
break;
case XML_TEXT_NODE:
name = DOT("text");
break;
case XML_CDATA_SECTION_NODE:
name = DOT("cdata");
break;
default:
name = NULL;
}
if(name && name[0])
fun = RS_XML(findFunction)(name, parserSettings->converters);
return(fun);
}
/*
XXX Unravel this recursive call into a loop.
Starting at the top node, fix the id to be empty.
Then add the node and get the ID.
Then loop over the children, and the node and call the routine
on its children
*/
/*
at a given node, make the node
*/
void
addNodeAndChildrenToTree(xmlNodePtr node, SEXP id, SEXP e, R_XMLSettings *parserSettings, int *ctr)
{
SEXP tmp;
xmlNodePtr n;
if(!node)
return;
/* Create a skeleton node with no children. */
tmp = RS_XML(createXMLNode)(node, 0, 0/* doesn't matter */, parserSettings, R_NilValue);/*XXX*/
if(!tmp)
return;
SETCAR(CDR(e), tmp);
(*ctr)++;
id = Rf_eval(e, R_GlobalEnv);
PROTECT(id);
n = node->children;
while(n) {
SETCAR(CDR(CDR(e)), id);
addNodeAndChildrenToTree(n, id, e, parserSettings, ctr);
(*ctr)++;
n = n->next;
}
UNPROTECT(1);
}
SEXP
addNodesToTree(xmlNodePtr node, R_XMLSettings *parserSettings)
{
xmlNodePtr ptr = node;
SEXP e, id;
int ctr = 0;
PROTECT(e = allocVector(LANGSXP, 3));
SETCAR(e, parserSettings->converters);
PROTECT(id = NEW_CHARACTER(0));
ptr = node;
/* loop over the sibling nodes here in case we have multiple roots,
e.g. a comment, PI and a real node. See xysize.svg
*/
while(ptr) {
SETCAR(CDR(CDR(e)), id);
addNodeAndChildrenToTree(ptr, id, e, parserSettings, &ctr);
ptr = ptr->next;
}
UNPROTECT(2); /* e, id */
return(ScalarInteger(ctr));
}
/**
Creates the R objects representing the children or siblings of the specified
node, handling simple text cases with no children, as well as recursively
processing the children.
node the node whose children or siblings should be converted.
direction DOWN or SIDEWAYS indicating the children or siblings should
be processed, respectively. If SIDEWAYS is specified, the node itself
is included in the result.
parserSettings "global" information about the parsing conversion for the duration of the parser.
Return list of XMLNode objects.
*/
USER_OBJECT_
RS_XML(createNodeChildren)(xmlNodePtr node, int direction, R_XMLSettings *parserSettings)
{
int n = 0, i;
USER_OBJECT_ ans = NULL_USER_OBJECT;
USER_OBJECT_ elNames = NULL;
int unProtect = 0;
xmlNodePtr base, c = (direction == SIDEWAYS) ? node :
#ifndef USE_OLD_ROOT_CHILD_NAMES
node->xmlChildrenNode;
#else
node->childs;
#endif
DECL_ENCODING_FROM_NODE(node)
base = c;
if(IS_FUNCTION(parserSettings->converters)) {
return(addNodesToTree(node, parserSettings));
}
/* Count the number of elements being converted. */
while(c) {
c = c->next;
n++;
}
if(n > 0) {
USER_OBJECT_ tmp;
USER_OBJECT_ tmpNames;
int count = 0;
c = base;
PROTECT(ans = NEW_LIST(n));
PROTECT(elNames = NEW_CHARACTER(n));
unProtect = 2;
for(i = 0; i < n; i++, c = c->next) {
tmp = RS_XML(createXMLNode)(c, 1, DOWN, parserSettings, ans);
if(tmp && tmp != NULL_USER_OBJECT) {
SET_VECTOR_ELT(ans, count, tmp);
if(c->name)
SET_STRING_ELT(elNames, count, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(c->name)));
count++;
}
}
if(count < n) {
/* Reset the length! */
#ifdef USE_S
#else
PROTECT(tmp = NEW_LIST(count));
PROTECT(tmpNames = NEW_CHARACTER(count));
for(i = 0 ; i < count ; i++) {
SET_VECTOR_ELT(tmp, i, VECTOR_ELT(ans, i));
SET_STRING_ELT(tmpNames, i, STRING_ELT(elNames, i));
}
ans = tmp;
SET_NAMES(ans, tmpNames);
UNPROTECT(4);
PROTECT(ans);
unProtect = 1;
#endif
} else {
SET_NAMES(ans, elNames);
}
if(unProtect > 0)
UNPROTECT(unProtect);
}
return(ans);
}
USER_OBJECT_
RS_XML(notifyNamespaceDefinition)(USER_OBJECT_ arg, R_XMLSettings *parserSettings)
{
USER_OBJECT_ fun, ans = NULL_USER_OBJECT;
fun = RS_XML(findFunction)("namespace", parserSettings->converters);
if(fun != NULL) {
USER_OBJECT_ opArgs = NEW_LIST(1);
USER_OBJECT_ tmp;
PROTECT(opArgs);
SET_VECTOR_ELT(opArgs, 0, arg);
tmp = RS_XML(invokeFunction)(fun, opArgs, NULL, NULL);
ans = tmp;
UNPROTECT(1);
}
return(ans);
}
#ifdef USE_XML_VERSION_H
#ifndef LIBXML_TEST_VERSION
#include
#endif
#endif
USER_OBJECT_
RS_XML(libxmlVersion)(void)
{
USER_OBJECT_ ans;
unsigned int val;
#ifdef LIBXML_VERSION_NUMBER
val = LIBXML_VERSION_NUMBER;
#else
#ifdef LIBXML_VERSION
val = LIBXML_VERSION;
#else
val = 0;
#endif
#endif
ans = NEW_NUMERIC(1);
NUMERIC_DATA(ans)[0] = val;
return(ans);
}
static
void
notifyError(const char *msg, va_list ap, Rboolean isError)
{
#if 0
if(isError) {
Rf_error("error in validating XML document");
} else {
Rf_error("warning when validating XML document");
}
#else
#define BUFSIZE 2048
char buf[BUFSIZE];
memset(buf, '\0', BUFSIZE);
vsnprintf(buf, BUFSIZE, msg, ap);
Rf_warning("%s", buf);
#endif
}
void
RS_XML(ValidationError)(void *ctx, const char *format, ...)
{
char *msg = "Message unavailable";
va_list(ap);
va_start(ap, format);
if(strcmp(format, "%s") == 0)
msg = va_arg(ap, char *);
va_end(ap);
notifyError(msg, ap, TRUE);
}
void
RS_XML(ValidationWarning)(void *ctx, const char *format, ...)
{
char *msg = "Message unavailable";
va_list(ap);
va_start(ap, format);
if(strcmp(format, "%s") == 0)
msg = va_arg(ap, char *);
va_end(ap);
notifyError(msg, ap, FALSE);
}
USER_OBJECT_
R_createXMLNode(USER_OBJECT_ snode, USER_OBJECT_ handlers, USER_OBJECT_ r_trim, USER_OBJECT_ r_skipBlankLines)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
R_XMLSettings parserSettings;
parserSettings.converters = handlers;
parserSettings.trim = LOGICAL(r_trim)[0];
parserSettings.skipBlankLines = LOGICAL(r_skipBlankLines)[0];
return(RS_XML(createNodeChildren)(node, SIDEWAYS, &parserSettings));
}
USER_OBJECT_
RS_XML_xmlNodeName(USER_OBJECT_ snode)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
USER_OBJECT_ ans;
DECL_ENCODING_FROM_NODE(node)
PROTECT(ans = NEW_CHARACTER(1));
SET_STRING_ELT(ans, 0, node->name ? ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(node->name)) : R_NaString);
UNPROTECT(1);
return(ans);
}
USER_OBJECT_
RS_XML_xmlNodeNamespace(USER_OBJECT_ snode)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
USER_OBJECT_ ans;
xmlNs *ns;
DECL_ENCODING_FROM_NODE(node)
ns = node->ns;
if(!ns)
return(NEW_CHARACTER(0));
PROTECT(ans = NEW_CHARACTER(1));
if(ns->href)
SET_STRING_ELT(ans, 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(ns->href)));
if(ns->prefix)
SET_NAMES(ans, ScalarString(ENC_COPY_TO_USER_STRING(ns->prefix)));
SET_CLASS(ans, mkString("XMLNamespace"));
UNPROTECT(1);
return(ans);
}
enum {
R_XML_NS_ADD_PREFIX = 1,
R_XML_NS_ADD_URL_DEFS = 2
};
USER_OBJECT_
RS_XML_xmlNodeAttributes(USER_OBJECT_ snode, USER_OBJECT_ addNamespaces, USER_OBJECT_ addNamespaceURLs)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
R_XMLSettings parserSettings;
parserSettings.addAttributeNamespaces = 0;
if(LOGICAL_DATA(addNamespaces)[0])
parserSettings.addAttributeNamespaces |= R_XML_NS_ADD_PREFIX;
if(LOGICAL_DATA(addNamespaceURLs)[0])
parserSettings.addAttributeNamespaces |= R_XML_NS_ADD_URL_DEFS;
return(RS_XML(AttributeList)(node, &parserSettings));
}
USER_OBJECT_
RS_XML_xmlNodeParent(USER_OBJECT_ snode, USER_OBJECT_ manageMemory)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
if(node->parent && (node->parent->type == XML_DOCUMENT_NODE || node->parent->type == XML_HTML_DOCUMENT_NODE))
return(NULL_USER_OBJECT);
return(R_createXMLNodeRef(node->parent, manageMemory));
}
USER_OBJECT_
RS_XML_xmlNodeNumChildren(USER_OBJECT_ snode)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
int count = 0;
xmlNodePtr ptr = node->children;
while(ptr) {
count++;
ptr = ptr->next;
}
return(ScalarInteger(count));
}
USER_OBJECT_
RS_XML_xmlNodeChildrenReferences(USER_OBJECT_ snode, USER_OBJECT_ r_addNames, USER_OBJECT_ manageMemory)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
USER_OBJECT_ ans, names = R_NilValue;
int count = 0, i;
xmlNodePtr ptr = node->children;
int addNames = LOGICAL(r_addNames)[0];
DECL_ENCODING_FROM_NODE(node)
int nprot = 0;
while(ptr) {
count++;
ptr = ptr->next;
}
ptr = node->children;
PROTECT(ans = NEW_LIST(count)); nprot++;
if(addNames) {
PROTECT(names = NEW_CHARACTER(count));
nprot++;
}
for(i = 0; i < count ; i++, ptr = ptr->next) {
SET_VECTOR_ELT(ans, i, R_createXMLNodeRef(ptr, manageMemory));
if(addNames)
SET_STRING_ELT(names, i, ENC_COPY_TO_USER_STRING(ptr->name ? ptr->name : (const xmlChar *)""));
}
if(addNames)
SET_NAMES(ans, names);
UNPROTECT(nprot);
return(ans);
}
USER_OBJECT_
R_getNodeChildByIndex(USER_OBJECT_ snode, USER_OBJECT_ r_index, USER_OBJECT_ manageMemory)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(snode);
int count = 0, num;
xmlNodePtr ptr = node->children;
num = INTEGER(r_index)[0] - 1;
if(num < 0) {
Rf_error("cannot index an internal node with a negative number %d", num);
}
while(ptr && count < num) {
count++;
ptr = ptr->next;
}
return(ptr ? R_createXMLNodeRef(ptr, manageMemory) : NULL_USER_OBJECT);
}
static USER_OBJECT_
makeSchemaReference(xmlSchemaPtr schema)
{
return(R_makeRefObject(schema, "xmlSchemaRef"));
/*
USER_OBJECT_ ans;
PROTECT(ans = R_MakeExternalPtr(schema, Rf_install("XMLSchema"), R_NilValue));
SET_CLASS(ans, mkString("XMLSchema"));
UNPROTECT(1);
return(ans);
*/
}
// unused
#define NO_XML_MEMORY_SHOW_ROUTINE 1
#ifndef NO_XML_MEMORY_SHOW_ROUTINE
void
RS_XML_MemoryShow()
{
xmlMemDisplay(stderr);
}
#endif
USER_OBJECT_
RS_XML_setDocumentName(USER_OBJECT_ sdoc, USER_OBJECT_ sname)
{
/* if doc is NULL in C , return NULL in R
If doc->name is NULL in C, return NA
Otherwise, return the string.
*/
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
if(!doc) {
Rf_warning("NULL pointer supplied for internal document");
return(R_NilValue);
}
doc->URL = xmlStrdup(CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(sname, 0))));
return(sdoc);
}
USER_OBJECT_
RS_XML_getDocumentName(USER_OBJECT_ sdoc)
{
/* if doc is NULL in C , return NULL in R
If doc->name is NULL in C, return NA
Otherwise, return the string.
*/
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(sdoc);
USER_OBJECT_ ans;
const xmlChar *encoding;
if(!doc) {
Rf_warning("NULL pointer supplied for internal document");
return(R_NilValue);
}
encoding = doc->encoding;
PROTECT(ans = NEW_CHARACTER(1));
SET_STRING_ELT(ans, 0, doc->URL ? ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(doc->URL)) : R_NaString);
UNPROTECT(1);
return(ans);
}
USER_OBJECT_
RS_XML_setKeepBlanksDefault(USER_OBJECT_ val)
{
int prev;
prev = xmlKeepBlanksDefault(INTEGER(val)[0]);
return(ScalarInteger(prev));
}
USER_OBJECT_
RS_XML_xmlXIncludeProcessFlags(USER_OBJECT_ r_doc, USER_OBJECT_ r_flags)
{
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(r_doc);
int ans;
ans = xmlXIncludeProcessFlags(doc, INTEGER(r_flags)[0]);
return(ScalarInteger(ans));
}
USER_OBJECT_
RS_XML_xmlXIncludeProcessTreeFlags(USER_OBJECT_ r_node, USER_OBJECT_ r_flags)
{
xmlNodePtr node;
int flags = INTEGER(r_flags)[0];
int n;
//xmlNodePtr prev, parent;
SEXP ans = R_NilValue;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
//prev = node->prev;
//parent = node->parent;
n = xmlXIncludeProcessTreeFlags(node, flags);
if(n == 0)
return(R_NilValue);
else if(n == -1) {
Rf_error("failed in XInclude");
}
#if 0
if(!prev) {
fprintf(stderr, "Adding to children of %s\n", prev->name);
prev = parent->children;
} else {
fprintf(stderr, "Adding after %s\n", prev->name);
prev = prev->next;
}
prev = node->next;
PROTECT(ans = NEW_LIST(n));
for(i = 0; i < n; i++) {
SET_VECTOR_ELT(ans, i, prev ? R_createXMLNodeRef(prev) : R_NilValue);
prev = prev->next;
}
UNPROTECT(1);
#endif
return(ans);
}
/**
Create an R named list containing the attributes of the specified node.
*/
/*
We could use the CONS mechanism rather than doing a double pass.
Not certain what is quicker in this situation. Also, doesn't
work that way in S4, so keep it this way.
*/
USER_OBJECT_
RS_XML(AttributeList)(xmlNodePtr node, R_XMLSettings *parserSettings)
{
USER_OBJECT_ ans = NULL_USER_OBJECT;
USER_OBJECT_ ans_names;
xmlAttr * atts;
const xmlChar *encoding = node->doc ? node->doc->encoding : NULL;
int n = 0, i;
/* Count the number of attributes*/
atts = node->properties;
while(atts) {
n++;
atts = atts->next;
}
if(n > 0) {
SEXP ans_namespaces, ans_namespaceDefs;
int nonTrivialAttrNamespaces = 0;
int addNSPrefix = parserSettings->addAttributeNamespaces & R_XML_NS_ADD_PREFIX;
int retNSDefs = parserSettings->addAttributeNamespaces & R_XML_NS_ADD_URL_DEFS;
PROTECT(ans = NEW_CHARACTER(n));
PROTECT(ans_names = NEW_CHARACTER(n));
PROTECT(ans_namespaces = NEW_CHARACTER(n));
PROTECT(ans_namespaceDefs = NEW_CHARACTER(retNSDefs ? n : 0));
/* Loop over the attributes and create the string elements
and the elements of the name vector.
*/
atts = node->properties;
for(i=0; i < n ; i++) {
/* Have to be careful that atts->val and atts->val->context are non-null. Something like
kills it otherwise.
*/
#ifdef LIBXML2
SET_STRING_ELT(ans, i,
ENC_COPY_TO_USER_STRING(
XMLCHAR_TO_CHAR(
((atts->xmlChildrenNode != (xmlNode*)NULL && atts->xmlChildrenNode->content != (xmlChar*)NULL )
? atts->xmlChildrenNode->content : (xmlChar*)""))));
#else
SET_STRING_ELT(ans, i, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(((atts->val != (xmlNode*)NULL && atts->val->content != (xmlChar*)NULL )
? atts->val->content : (xmlChar*)""))));
#endif
if(atts->name) {
if(addNSPrefix && atts->ns && atts->ns->prefix) {
char buf[400];
snprintf(buf, 400, "%s:%s", atts->ns->prefix, atts->name);
SET_STRING_ELT(ans_names, i, ENC_COPY_TO_USER_STRING(buf));
} else
SET_STRING_ELT(ans_names, i, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(atts->name)));
if((addNSPrefix | retNSDefs) && atts->ns && atts->ns->prefix) {
SET_STRING_ELT(ans_namespaces, i, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(atts->ns->prefix)));
if(retNSDefs)
SET_STRING_ELT(ans_namespaceDefs, i, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(atts->ns->href)));
nonTrivialAttrNamespaces++;
}
}
atts = atts->next;
}
if(nonTrivialAttrNamespaces) {
if(retNSDefs)
Rf_setAttrib(ans_namespaces, Rf_install("names"), ans_namespaceDefs);
Rf_setAttrib(ans, Rf_install("namespaces"), ans_namespaces);
}
SET_NAMES(ans, ans_names);
UNPROTECT(4);
}
#if 0
else
ans = NEW_CHARACTER(0);
#endif
return(ans);
}
SEXP
R_getDocEncoding(SEXP r_doc)
{
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(r_doc);
const xmlChar *encoding;
SEXP ans;
if(doc->type != XML_DOCUMENT_NODE && doc->type != XML_HTML_DOCUMENT_NODE)
doc = ((xmlNodePtr) doc)->doc;
if(!doc)
return(NEW_CHARACTER(0));
encoding = doc->encoding;
PROTECT(ans = NEW_CHARACTER(1));
SET_STRING_ELT(ans, 0, encoding ? CreateCharSexpWithEncoding(doc->encoding, doc->encoding) : R_NaString);
UNPROTECT(1);
return(ans);
}
int
getTextElementLineNumber(xmlNodePtr node)
{
int val = -1;
if(node->parent)
val = node->parent->line;
xmlNodePtr prev = node->prev;
while(prev) {
if(prev->line > 0) {
val = prev->line;
break;
}
prev = prev->prev;
}
return(val);
}
SEXP
R_getLineNumber(SEXP r_node)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
if(!node) {
return(NEW_INTEGER(0));
}
// XML_GET_LINE(node)
return(ScalarInteger(node->line == 0 ?
getTextElementLineNumber(node) : node->line));
}
SEXP
R_xmlReadFile(SEXP r_filename, SEXP r_encoding, SEXP r_options) //, SEXP manageMemory)
{
const char *filename;
const char *encoding = NULL;
int options;
xmlDocPtr doc;
filename = CHAR_DEREF(STRING_ELT(r_filename, 0));
if(Rf_length(r_encoding))
encoding = CHAR_DEREF(STRING_ELT(r_encoding, 0));
options = INTEGER(r_options)[0];
doc = xmlReadFile(filename, encoding, options);
return(R_createXMLDocRef(doc));
}
SEXP
R_xmlReadMemory(SEXP r_txt, SEXP len, SEXP r_encoding, SEXP r_options, SEXP r_base) //, SEXP manageMemory)
{
const char *txt;
const char *encoding = NULL;
const char *baseURL = NULL;
int options;
xmlDocPtr doc;
txt = CHAR_DEREF(STRING_ELT(r_txt, 0));
if(Rf_length(r_encoding))
encoding = CHAR_DEREF(STRING_ELT(r_encoding, 0));
options = INTEGER(r_options)[0];
if(Rf_length(r_base))
baseURL = CHAR_DEREF(STRING_ELT(r_base, 0));
doc = xmlReadMemory(txt, INTEGER(len)[0], baseURL, encoding, options);
return(R_createXMLDocRef(doc));
}
#if 1
int
addXInclude(xmlNodePtr ptr, SEXP *ans, int level, SEXP manageMemory)
{
if(ptr->type == XML_XINCLUDE_START) {
int len = Rf_length(*ans) + 1;
SEXP oans = *ans; // avoid sequence-point error
PROTECT(*ans = SET_LENGTH(oans, len));
SET_VECTOR_ELT(*ans, len - 1, R_createXMLNodeRef(ptr, manageMemory));
UNPROTECT(1);
return(1);
} else
return(0);
}
int
processKids(xmlNodePtr ptr, SEXP *ans, int level, SEXP manageMemory)
{
xmlNodePtr kids;
int count = 0;
kids = ptr->children;
while(kids) {
count += addXInclude(kids, ans, level, manageMemory);
count += processKids(kids, ans, level + 1, manageMemory);
kids = kids->next;
}
return(count);
}
#if 0
int
findXIncludeStartNodes(xmlNodePtr node, SEXP *ans, int level)
{
const char * prefix[] = {"", " ", " ", " " };
xmlNodePtr ptr = node;
int count = 0;
addXInclude(node, ans, level);
ptr = node;
while(ptr) {
count += addXInclude(ptr, ans, level);
count += processKids(ptr, ans, level);
ptr = ptr->next;
}
//fprintf(stderr, "%s level = %d, %s: %p, type = %d\n", prefix[level], level, ptr->name, node, ptr->type);
//fprintf(stderr, "%p, %s, level = %d, type = %d\n", ptr, ptr->name, level, ptr->type);
return(count);
}
#endif
/*
This is a recursive version. We want an iterative version.
*/
SEXP
R_findXIncludeStartNodes(SEXP r_root, SEXP manageMemory)
{
xmlNodePtr root;
SEXP ans;
root = (xmlNodePtr) R_ExternalPtrAddr(r_root);
if(!root)
return(R_NilValue);
PROTECT(ans = allocVector(VECSXP, 0));
addXInclude(root, &ans, 0, manageMemory);
processKids(root, &ans, 0, manageMemory);
UNPROTECT(1);
return(ans);
}
#endif
XML/src/Utils.c 0000644 0001751 0000144 00000021761 14343054535 013000 0 ustar hornik users /**
Routines that are shared across the two XML parsers and their callbacks
to R.
isBlank - determines if a string consists entirely of whitespace
RS_XML(invokeFunction) - call a user-level function, previously located
by RS_XML(findFunction).
RS_XML(findFunction) - search a list or closure for a function object
with a given name in that list.
* See Copyright for the license status of this software.
*/
#include "Utils.h"
#include /* For isspace() */
#ifdef LIBXML
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
#endif
#include "RSCommon.h" /* for SET_NAMES */
/**
Tests whether the string contains only white space
or not. Returns 1 if is only white space. 0 otherwise.
*/
int isBlank(const char *str)
{
int blank=0;
const char *ptr = str;
while(ptr && (blank = isspace(ptr[0]))) {
ptr++;
}
return(blank);
}
/**
Does an in place trimming of a string by returning a pointer
to the first non-white space character and also inserting a
string terminator after the last non-whitespace character.
*/
char *
trim(char *str)
{
char *tmp;
/* If a degenerate string, just return. */
if(str == (char*)NULL || str[0] == '\0')
return(str);
/* Jumpt to the end */
tmp = str + strlen(str) - 1;
while(tmp >= str && isspace(*tmp)) {
tmp[0] = '\0';
tmp--;
}
if(tmp == str) {
#if 0
if(strlen(tmp) > 1)
tmp[0] = '\0';
#endif
return(str);
}
#if 0
else
tmp[1] = '\0';
#endif
tmp = str;
while(*tmp && isspace(*tmp)) {
tmp++;
}
return(tmp);
}
USER_OBJECT_
RS_XML(treeApply)(USER_OBJECT_ rtree, USER_OBJECT_ function, USER_OBJECT_ args)
{
return(rtree);
}
/**
Error handling utilities for use with the libxml document parsing mechanism.
Intercept the error handling by replacing it with a routine of the same name
and have it print to a buffer. Then call the Warning handler. Then the warnings
will end up in the local system, accessible via the warnings() function.
This allows them to be programmatically processed rather than having to process
the output to the terminal (via catching it in a call sink()).
*/
#ifdef LIBXML
#include
void localXmlParserPrintFileInfo(xmlParserInputPtr input, char *buf, int nbuf);
#ifndef USE_LINKED_ERROR_HANDLER
void S_xmlParserError(void *ctx, const char *msg, ...)
#else
void xmlParserError(void *ctx, const char *msg, ...)
#endif
{
va_list args;
#if 1
va_start(args, msg);
stop("XMLParserError", msg, args);
#else
xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr) ctx;
char buf[3000], *tmp;
/* Empty the string buffer. */
memset(buf , '\0', sizeof(buf)/sizeof(buf[0]));
/* Insert the file and line number. */
localXmlParserPrintFileInfo(ctxt->input, buf, 3000);
/* Move to the end of the buffer's contents. */
tmp = buf + strlen(buf);
va_start(args, msg);
/* Write in the actual message. */
vsprintf(tmp, msg, args);
va_end(args);
Rf_warning("XML Parsing Error: %s", buf);
#endif
}
#ifndef USE_LINKED_ERROR_HANDLER
/*
Set the default error handlers in the libxml library
*/
void
RSXML_setErrorHandlers(void)
{
// Next 2 are deprecated in 2.10.x and will be made private
#if LIBXML_VERSION < 21000
xmlDefaultSAXHandlerInit();
htmlDefaultSAXHandlerInit();
xmlDefaultSAXHandler.error = S_xmlParserError;
htmlDefaultSAXHandler.error = S_xmlParserError;
#endif
#if 0
docbDefaultSAXHandlerInit();
docbDefaultSAXHandler.error = S_xmlParserError;
#endif
}
#endif
/**
Write the file name and the current line number into the specified
string.
*/
void localXmlParserPrintFileInfo(xmlParserInputPtr input, char *buf, int nbuf) {
if (input != NULL) {
if (input->filename)
snprintf(buf, nbuf, "%s:%d: ", input->filename,
input->line);
else
snprintf(buf, nbuf, "Entity: line %d: ", input->line);
}
}
#endif
/**
Utility method for setting the names of a list/vector from an array of
native strings rather than an R/S character vector structure.
*/
void
RS_XML(SetNames)(int n, const char *cnames[], USER_OBJECT_ ans)
{
int i;
USER_OBJECT_ names;
PROTECT(names = NEW_CHARACTER(n));
for(i = 0; i < n ; i++) {
/* could install as a pre-defined string. */
SET_STRING_ELT(names, i, mkChar(cnames[i]));
}
SET_NAMES(ans, names);
UNPROTECT(1);
}
/*
Set the class of the target object to be the character vector containing
just the specified name.
*/
int
RS_XML(SetClassName)(const char *localClassName, USER_OBJECT_ target)
{
USER_OBJECT_ className;
PROTECT(className = NEW_CHARACTER(1));
SET_STRING_ELT(className, 0, mkChar(localClassName));
SET_CLASS(target, className);
UNPROTECT(1);
return(1);
}
#if LIBXML2
struct _xmlHashTable {
struct _xmlHashEntry **table;
int size;
};
#endif
#if OWN_XML_HASH_SIZE
int xmlHashSize(xmlHashTablePtr table)
{
/* For version 2.2.* */
return(table->size);
/*
return(table->nb_entities);
*/
}
#endif
USER_OBJECT_
RS_XML(findFunction)(const char *opName, USER_OBJECT_ _userObject)
{
int i;
USER_OBJECT_ fun = NULL;
/* Get the names of the list. */
USER_OBJECT_ names = GET_NAMES(_userObject);
/* lookup function in the names of the list */
for (i = 0; i < GET_LENGTH(names); i++) {
if(!strcmp(opName, CHAR_DEREF(STRING_ELT(names, i)))) {
fun = VECTOR_ELT(_userObject, i);
break;
}
}
return(fun);
}
SEXP
R_makeRefObject(void *ref, const char *className)
{
SEXP klass, obj, sref;
if(!ref) {
Rf_warning("NULL value for external reference");
return(R_NilValue);
}
PROTECT(klass = MAKE_CLASS((char *) className)); /* XXX define MAKE_CLASS with const */
if(klass == R_NilValue) { /* Is this the right test? */
Rf_error("Cannot find class %s for external reference", className);
}
PROTECT(obj = NEW_OBJECT(klass));
PROTECT(sref = R_MakeExternalPtr(ref, Rf_install(className), R_NilValue));
obj = SET_SLOT(obj, Rf_install("ref"), sref);
UNPROTECT(3);
return(obj);
}
#include
#define copyStrField(x) SET_VECTOR_ELT(ans, i, mkString(uri->x ? uri->x : "")); \
SET_STRING_ELT(names, i, mkChar(#x)); i++;
SEXP
R_parseURI(SEXP r_uri)
{
xmlURIPtr uri;
SEXP ans, names;
int i= 0;
uri = xmlParseURI( CHAR( STRING_ELT( r_uri, 0 )));
if(!uri) {
Rf_error("cannot parse URI %s", CHAR( STRING_ELT( r_uri, 0) ) );
}
PROTECT(ans = NEW_LIST(8));
PROTECT(names = NEW_CHARACTER(8));
copyStrField(scheme);
copyStrField(authority);
copyStrField(server);
copyStrField(user);
copyStrField(path);
copyStrField(query);
copyStrField(fragment);
SET_VECTOR_ELT(ans, i, ScalarInteger(uri->port));
SET_STRING_ELT(names, i, mkChar("port"));
SET_NAMES(ans, names);
UNPROTECT(2);
return(ans);
}
#define min(x, y) ((x) < (y) ? (x) : (y))
SEXP
RSXML_structuredStop(SEXP errorFun, xmlErrorPtr err)
{
SEXP e, ptr;
int n = 8;
if(!err)
n = 2;
PROTECT(e = allocVector(LANGSXP, n));
SETCAR(e, errorFun != NULL && errorFun != R_NilValue ? errorFun : Rf_install("xmlStructuredStop"));
ptr = CDR(e);
if(err) {
SETCAR(ptr, mkString(err->message));
ptr= CDR(ptr);
SETCAR(ptr, ScalarInteger(err->code));
ptr= CDR(ptr);
SETCAR(ptr, ScalarInteger(err->domain));
ptr= CDR(ptr);
SETCAR(ptr, ScalarInteger(err->line));
ptr= CDR(ptr);
SETCAR(ptr, ScalarInteger(err->int2));
ptr= CDR(ptr);
SETCAR(ptr, ScalarInteger(err->level));
ptr= CDR(ptr);
SETCAR(ptr, err->file ? mkString(err->file) : NEW_CHARACTER(0));
} else {
SETCAR(ptr, NEW_CHARACTER(0));
}
Rf_eval(e, R_GlobalEnv);
UNPROTECT(1);
/* Shouldn't get back to here! Rf_eval() should raise an error.*/
return(R_NilValue);
}
/*
Because we call this function via Rf_eval(), we end up
with an extra call on the stack when we enter recover.
*/
SEXP
stop(const char *className, const char *msg, ...)
{
char buf[10000];
SEXP error, e, ns_env, ns_name;
va_list ap;
va_start(ap, msg);
/* Rvsnprintf(buf, sizeof(buf)/sizeof(buf[0]), msg, ap); */
vsnprintf(buf, sizeof(buf)/sizeof(buf[0]), msg, ap);
va_end(ap);
PROTECT(error = mkString(buf));
/*
const char * classNames[] = {"simpleError", "error", "condition"};
PROTECT(tmp = allocVector(STRSXP, sizeof(classNames)/sizeof(classNames[0])));
for(i = 0; i < sizeof(classNames)/sizeof(classNames[0]); i++)
SET_STRING_ELT(tmp, i+1, mkChar(classNames[i]));
SET_STRING_ELT(tmp, 0, mkChar(className));
SET_CLASS(error, tmp);
*/
PROTECT(e = allocVector(LANGSXP, 2));
PROTECT(ns_name = mkString("XML"));
PROTECT(ns_env = R_FindNamespace(ns_name));
SETCAR(e, findVarInFrame(ns_env, Rf_install("xmlStop")));
SETCAR(CDR(e), error);
Rf_eval(e, R_GlobalEnv);
UNPROTECT(4);
/*
errorcall(error, "%s", msg);
UNPROTECT(1);
*/
return(error);
}
XML/src/Rcatalog.c 0000644 0001751 0000144 00000004122 14316271033 013416 0 ustar hornik users #include "Utils.h" /* For isBlank() */
#include
USER_OBJECT_
R_xmlCatalogResolve(USER_OBJECT_ r_id, USER_OBJECT_ type, USER_OBJECT_ debug)
{
xmlChar *id;
SEXP r_ans = R_NilValue;
xmlChar* ans = NULL;
int debugLevel = -1;
int n, i;
debugLevel = xmlCatalogSetDebug(LOGICAL(debug)[0]);
n = GET_LENGTH(r_id);
PROTECT(r_ans = NEW_CHARACTER(n));
for(i = 0; i < n; i++) {
id = CHAR_TO_XMLCHAR(CHAR_DEREF(STRING_ELT(r_id, i)));
switch(INTEGER(type)[i]) {
case 1:
ans = xmlCatalogResolveURI(id);
break;
case 2:
ans = xmlCatalogResolvePublic(id);
break;
case 3:
ans = xmlCatalogResolveSystem(id);
break;
default:
break;
}
if(ans) {
SET_STRING_ELT(r_ans, i, mkChar(XMLCHAR_TO_CHAR(ans)));
xmlFree(ans);
} else {
SET_STRING_ELT(r_ans, i, NA_STRING);
}
}
UNPROTECT(1);
xmlCatalogSetDebug(debugLevel);
return(r_ans);
}
SEXP
RS_XML_loadCatalog(SEXP catalogs)
{
int i, n;
SEXP ans;
n = GET_LENGTH(catalogs);
ans = NEW_LOGICAL(n);
for(i = 0; i < n ; i++) {
LOGICAL(ans)[i] = (xmlLoadCatalog(CHAR(STRING_ELT(catalogs, i))) == 0);
}
return(ans);
}
SEXP
RS_XML_clearCatalog(void)
{
xmlCatalogCleanup();
return(ScalarLogical(1));
}
SEXP
RS_XML_catalogAdd(SEXP orig, SEXP replace, SEXP type)
{
int i, n;
SEXP ans;
n = LENGTH(orig);
ans = NEW_LOGICAL(n);
for(i = 0; i < n ; i++) {
LOGICAL(ans)[i] = (xmlCatalogAdd(CHAR_TO_XMLCHAR(CHAR(STRING_ELT(type, i))),
CHAR_TO_XMLCHAR(CHAR(STRING_ELT(orig, i))),
CHAR_TO_XMLCHAR(CHAR(STRING_ELT(replace, i)))) == 0);
}
return(ans);
}
SEXP
RS_XML_catalogDump(SEXP fileName)
{
FILE *out;
out = fopen(CHAR(STRING_ELT(fileName, 0)), "w");
if(!out) {
Rf_error("Can't open file %s for write access", CHAR(STRING_ELT(fileName, 0)) );
}
xmlCatalogDump(out);
return(ScalarLogical(TRUE));
}
void
R_xmlInitializeCatalog(void)
{
xmlInitializeCatalog();
}
XML/src/NodeGC.c 0000644 0001751 0000144 00000025422 14317044444 012774 0 ustar hornik users /*
THIS COMMENT MAY NOT BE UP TO DATE. Sep 19 2011.
The idea is as follows. We use the private field in the xmlDocPtr
object to store information about nodes that are out in the wild,
i.e. that have been returned to R across the .Call() interface.
Each time a node is returned, we increment the number of references
to that node by incrementing a table in the xmlDocPtr.
Each time these R objects are garbage collected, we decrement the
reference count. When the number of references to that node go to 0,
we remove that entry from the table. When all the node entries
are removed and the document itself is no longer being pointed to,
we free the document.
What about circularity? Does it occur?
What happens when we reparent a node?
What happens when we put a node into an R object
e.g. x = node[[2]]
y[[3]] = x
Will R's garbage collection handle this for us?
*/
/*
This now contains the code related to our memory management.
*/
#include "Utils.h"
#include "NodeGC.h"
void R_xmlFreeDoc(SEXP ref)
{
xmlDocPtr doc;
doc = (xmlDocPtr) R_ExternalPtrAddr(ref);
if(doc && !IS_NOT_OUR_DOC_TO_TOUCH(doc)) {
int *val;
val = doc->_private;
if(val) {
(*val)--;
if(*val) {
#ifdef R_XML_DEBUG
fprintf(stderr, "Not freeing XML document %p (%s); still has %d references in the wild\n", doc, doc->URL ? doc->URL : "?", *val);
#endif
R_ClearExternalPtr(ref);
return;
}
}
#ifdef R_XML_DEBUG
const xmlChar *url = doc->URL ? doc->URL : (doc->name ? doc->name : (const xmlChar *)"?? (internally created)");
fprintf(stderr, "Cleaning up document %p, %s, has children %d\n", (void *) doc, url, (int) (doc->children != NULL));
#endif
if(val) {
free(val);
doc->_private = NULL;
#ifdef R_XML_DEBUG
fprintf(stderr, "Freeing the XML doc %p\n", doc);
#endif
xmlFreeDoc(doc);
R_numXMLDocsFreed++;
} /* was before the xmlFreeDoc so that that was unconditional.*/
}
R_ClearExternalPtr(ref);
}
SEXP
RS_XML_freeDoc(SEXP ref)
{
R_xmlFreeDoc(ref);
return(R_NilValue);
}
SEXP
RS_XML_forceFreeDoc(SEXP ref)
{
xmlDocPtr doc;
doc = (xmlDocPtr) R_ExternalPtrAddr(ref);
xmlFreeDoc(doc);
return(R_NilValue);
}
/* This is a finalizer that removes the nodes and disassociates the
node and the document and then frees the document structure.
Does xmlFreeDoc() deal with the URL and name fields in the doc?
XXX With the nodes and document under garbage collection, do we really
need this?
*/
void R_xmlFreeDocLeaveChildren(SEXP ref)
{
xmlDocPtr doc;
doc = (xmlDocPtr) R_ExternalPtrAddr(ref);
if(doc) {
xmlNodePtr tmp;
#ifdef R_XML_DEBUG
const xmlChar *url = doc->URL ? doc->URL : (doc->name ? doc->name : (const xmlChar *) "?? (internally created)");
fprintf(stderr, "Cleaning up document but not children: %p, %s\n", (void *) doc, url);
#endif
tmp = doc->children;
xmlUnlinkNode(doc->children);
tmp->doc = NULL;
xmlFreeDoc(doc);
R_numXMLDocsFreed++;
}
R_ClearExternalPtr(ref);
}
int R_XML_MemoryMgrMarker = 1010101011;
int R_XML_NoMemoryMgmt = 111111111;
/*
This returns a value that indicates whether we should
add a finalizer and put the XML node under a C finalizer
to reduce the reference count.
user is an R object that should be an integer vector of length
1 and should be 0, 1 or NA (effectively a logical)
If it is NA, we consult the document object in which the node
is located (or NULL if not part of a document). This document
object can have a value in the _private field that tells us
no to
*/
int
R_XML_getManageMemory(SEXP user, xmlDocPtr doc, xmlNodePtr node)
{
int manage;
if(TYPEOF(user) == STRSXP || TYPEOF(user) == EXTPTRSXP)
return(0);
manage = INTEGER(user)[0]; // TYPEOF(user) == INTSXP ? INTEGER(user)[0] : INTEGER(asInteger(user))[0];
if(manage == R_NaInt) {
if(!doc)
manage = 1;
else
manage = doc->_private != &R_XML_NoMemoryMgmt;
}
#ifdef R_XML_DEBUG
if(manage)
fprintf(stderr, "getManageMemory (%p) %d (type = %d, name = %s)\n", doc, manage, node->type, node->name);fflush(stderr);
#endif
return(manage);
}
SEXP
R_xmlSetNoMemoryMgmt(SEXP r_doc)
{
xmlDocPtr doc;
doc = (xmlDocPtr) R_ExternalPtrAddr(r_doc);
doc->_private = &R_XML_NoMemoryMgmt;
return(NULL_USER_OBJECT);
}
void
initDocRefCounter(xmlDocPtr doc)
{
int *val;
if(doc->_private)
return;
doc->_private = calloc(2, sizeof(int));
val = (int *) doc->_private;
val[1] = R_MEMORY_MANAGER_MARKER;
}
void
incrementDocRefBy(xmlDocPtr doc, int num)
{
int *val;
if(!doc || IS_NOT_OUR_DOC_TO_TOUCH(doc))
return;
if(!doc->_private) {
initDocRefCounter(doc);
}
val = (int *) doc->_private;
(*val) += num;
}
void
incrementDocRef(xmlDocPtr doc)
{
incrementDocRefBy(doc, 1);
}
#define GET_NODE_COUNT(n) \
n->_private ? *((int*) (n)->_private) : 0
int getNodeCount(xmlNodePtr node)
{
int val = 0;
xmlNodePtr p = node->children;
if(!node || IS_NOT_OUR_NODE_TO_TOUCH(node))
return(0);
val = GET_NODE_COUNT(node);
while(p) {
val += getNodeCount(p);
p = p->next;
}
return(val);
}
void
internal_incrementNodeRefCount(xmlNodePtr node)
{
int *val;
if(!node || IS_NOT_OUR_NODE_TO_TOUCH(node) || !node->_private)
return;
val = (int *) node->_private;
(*val)++;
}
SEXP
R_getXMLRefCount(SEXP rnode)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(rnode);
if(!node || IS_NOT_OUR_NODE_TO_TOUCH(node) || !node->_private)
return(ScalarInteger(-1));
return(ScalarInteger(*((int *) node->_private)));
}
int
checkDescendantsInR(xmlNodePtr node, int process)
{
xmlNodePtr p;
if(!node && (process || IS_NOT_OUR_NODE_TO_TOUCH(node)))
return(0);
if(node->_private)
return(1);
p = node->children;
while(p) {
if(checkDescendantsInR(p, 0))
return(1);
p = p->next;
}
return(0);
}
int
internal_decrementNodeRefCount(xmlNodePtr node)
{
int *val, status = 0;
/* */
if(!node || IS_NOT_OUR_NODE_TO_TOUCH(node))
/* if node->_private == NULL, should
* we free this node?, i.e. if it is
* not in a parent or a document.
No! Basically we shouldn't get here
if we have not set the _private. We
set the finalizer having set the _private */
return(status);
if(!node->_private)
return(status);
/* Get the value of the reference count and decrement it by 1.
If we are now at 0, then we can potentially free this node.
Certainly, if we are at 0, we should remove the reference
count memory altogether.
Now that _we_ no longer need the node, perhaps we can free it.
But we have to make certain that we don't free it if
a) it is a child of another node or
b) if it is within a document and that document is still "in play".
To determine if the document is "in play" we look at it's
reference count.
We decrement it by one since we added one to it for this
node.
If that makes the document's reference count 0, then we
free it.
*/
val = (int *) node->_private;
(*val)--;
#ifdef R_XML_DEBUG
fprintf(stderr, "decremented node (%s, %d) to %d (%p) %s\n", node->name, node->type, *val, node, *val == 0 ? "***" : "");fflush(stderr);
#endif
if(*val == 0) {
free(node->_private);
node->_private = NULL;
if(node->doc && !IS_NOT_OUR_DOC_TO_TOUCH(node->doc)) {
val = (int *) node->doc->_private;
if(val) (*val)--;
if(!val || *val == 0) {
/* Consolidate with R_xmlFreeDoc */
#ifdef R_XML_DEBUG
fprintf(stderr, "releasing document (for node) %p %s (%s)\n", node->doc, node->doc->URL ? node->doc->URL : "?", val ? "has zero count" : "no count");fflush(stderr);
#endif
if(val)
free(node->doc->_private);
node->doc->_private = NULL;
xmlFreeDoc(node->doc);
status = 1;
R_numXMLDocsFreed++;
}
} else if(!node->parent) {
/* If the node is not in a tree by having a parent, then
* check the children and if they aren't being referenced
by an R variable, we can free those too. */
int hold;
hold = checkDescendantsInR(node, 1);
if(!hold) {
#ifdef R_XML_DEBUG
fprintf(stderr, "Calling xmlFreeNode() for %p (type = %d)\n", node, node->type);fflush(stderr);
#endif
xmlFreeNode(node);
status = 1;
}
} else {
/* So we have a parent. But what if that parent is not
being held as an R variable. We need to free the node.
We need to make this smarter to see what parts of the
tree we can remove. For instance, we might be holding
onto this one, but not the parent, but that parent has
a second child which is being held onto.
So we go to the top of the node tree and check for its descendants
*/
int hold;
xmlNodePtr p = node->parent;
while(p->parent)
p = p->parent;
hold = checkDescendantsInR(p, 0);
if(!hold) {
#ifdef R_XML_DEBUG
fprintf(stderr, "Calling xmlFree() for %p\n", node);fflush(stderr);
#endif
xmlFree(p); //XXX xmlFree() or xmlFreeNode() ?
status = 1;
}
}
}
return(status);
}
void
decrementNodeRefCount(SEXP rnode)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(rnode);
int status;
status = internal_decrementNodeRefCount(node);
if(status)
R_ClearExternalPtr(rnode);
}
int
clearNodeMemoryManagement(xmlNodePtr node)
{
xmlNodePtr tmp;
int ctr = 0;
if(node->_private) {
int a, b;
// This compares and int and a pointer. Changed to be like NodeGC.h
// int isOurs = (a = node->_private != &R_XML_NoMemoryMgmt) && (b = ((int *)(node->_private))[1] == (int *) &R_XML_MemoryMgrMarker);
int isOurs = (a = node->_private != &R_XML_NoMemoryMgmt) && (b = ((int *)(node->_private))[1] == R_MEMORY_MANAGER_MARKER);
if(isOurs) {
#if R_XML_DEBUG
fprintf(stderr, "Removing memory management from %p, %s\n", node, node->name);fflush(stderr);
#endif
free(node->_private);
ctr++;
}
node->_private = NULL;
}
tmp = node->children;
while(tmp) {
if(tmp)
ctr += clearNodeMemoryManagement(tmp);
tmp = tmp->next;
}
return(ctr);
}
SEXP
R_clearNodeMemoryManagement(SEXP r_node)
{
xmlNodePtr node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
int val;
if(!node)
return(ScalarInteger(-1));
val = clearNodeMemoryManagement(node);
return(ScalarInteger(val));
}
SEXP
R_xmlRefCountEnabled(void)
{
int ans =
#ifdef XML_REF_COUNT_NODES
1;
#else
0;
#endif
return(ScalarLogical(ans));
}
XML/src/ExpatParse.c 0000644 0001751 0000144 00000006312 14106741723 013746 0 ustar hornik users /* only compile this if LIBEXPAT is defined. */
#ifdef LIBEXPAT
/*
File that provides the entry point for an event driven XML parser
that performs callbacks to the different user-level functions in
the closure passed to it.
* See Copyright for the license status of this software.
*/
#include "EventParse.h"
#include "Utils.h" /* For the findFunction and invokeFunction. */
#include "RSCommon.h"
#include "ExpatParse.h"
void
RS_XML(initParser)(XML_Parser parser, RS_XMLParserData *parserData)
{
XML_SetUserData(parser, parserData);
XML_SetElementHandler(parser, RS_XML(startElement), RS_XML(endElement));
XML_SetCommentHandler(parser, RS_XML(commentHandler));
XML_SetExternalEntityRefHandler(parser, RS_XML(externalEntityHandler));
XML_SetUnparsedEntityDeclHandler(parser, RS_XML(entityDeclarationHandler));
XML_SetCharacterDataHandler(parser, RS_XML(textHandler));
XML_SetProcessingInstructionHandler(parser, RS_XML(processingInstructionHandler));
XML_SetCdataSectionHandler(parser, RS_XML(startCdataSectionHandler), RS_XML(endCdataSectionHandler));
XML_SetBase(parser, parserData->fileName);
XML_SetNotStandaloneHandler(parser, RS_XML(notStandAloneHandler));
}
int RS_XML(parse)(FILE *file, USER_OBJECT_ handlers)
{
RS_XMLParserData *parserData;
parserData = RS_XML(createParserData)(handlers);
return(RS_XML(parseWithParserData)(file, parserData));
}
int RS_XML(parseWithParserData)(FILE *file, RS_XMLParserData *parserData)
{
char buf[BUFSIZ];
int done;
XML_Parser parser = XML_ParserCreate(NULL);
RS_XML(initParser)(parser, parserData);
do {
size_t len = fread(buf, 1, sizeof(buf), file);
done = len < sizeof(buf);
if (!XML_Parse(parser, buf, len, done)) {
Rf_warning("%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
} while (!done);
XML_ParserFree(parser);
return 0;
}
int
RS_XML(parseBufferWithParserData)(char *buf, RS_XMLParserData *parserData)
{
int status;
XML_Parser parser = XML_ParserCreate(NULL);
RS_XML(initParser)(parser, parserData);
status = XML_Parse(parser, buf, strlen(buf), 1);
if(status == 0) {
const char *msg = XML_ErrorString(XML_GetErrorCode(parser));
Rf_error("XML Parser Error: %s", msg);
}
return(status);
}
int
RS_XML(externalEntityHandler)(XML_Parser parser, const XML_Char *context,
const XML_Char *base, const XML_Char *systemId,
const XML_Char *publicId)
{
RS_XMLParserData *parserData = (RS_XMLParserData*)XML_GetUserData(parser);
USER_OBJECT_ opArgs;
int i, num;
const XML_Char *xml_args[4];
num = sizeof(xml_args)/sizeof(xml_args[0]);
xml_args[0] = context; xml_args[1] = base;xml_args[2] = systemId; xml_args[3] = publicId;
opArgs = NEW_LIST(num);
for(i =0;i < num; i++) {
RECURSIVE_DATA(opArgs)[i] = NEW_CHARACTER(1);
CHARACTER_DATA(RECURSIVE_DATA(opArgs)[i])[0] = ENC_COPY_TO_USER_STRING(xml_args[i] ? xml_args[i] : "");
}
RS_XML(callUserFunction)("externalEntity", NULL, parserData, opArgs);
return(1); /* ok to go on */
}
#else
/* Something to avoid an empty file.*/
void
XML_Expat_unused_dummy(void)
{
}
#endif /* only if LIBEXPAT is defined */
XML/src/Makevars.in 0000644 0001751 0000144 00000000136 13607633744 013635 0 ustar hornik users PKG_CPPFLAGS= @PKG_CPPFLAGS@ @LANGUAGE_DEFS@ @XMLSEC_DEFS@ -I. @LIBXML2@
PKG_LIBS= @PKG_LIBS@
XML/src/RUtils.c 0000644 0001751 0000144 00000023370 14327573457 013133 0 ustar hornik users #include "Utils.h"
#include
/*
Utilities used in the R XML parsing facilities for invoking user-level functions from C.
* See Copyright for the license status of this software.
*/
#include "Rinternals.h" /* Macros, etc. */
USER_OBJECT_
R_makeXMLContextRef(xmlParserCtxtPtr ctx)
{
USER_OBJECT_ ans;
PROTECT(ans = R_MakeExternalPtr(ctx, Rf_install(XML_PARSER_CONTEXT_TYPE_NAME), R_NilValue));
SET_CLASS(ans, mkString(XML_PARSER_CONTEXT_TYPE_NAME));
UNPROTECT(1);
return(ans);
}
USER_OBJECT_ R_InternalRecursiveApply(USER_OBJECT_ top, USER_OBJECT_ func, USER_OBJECT_ klasses);
USER_OBJECT_
RS_XML(invokeFunction)(USER_OBJECT_ fun, USER_OBJECT_ opArgs, USER_OBJECT_ data, xmlParserCtxtPtr context)
{
int i;
long n;
USER_OBJECT_ c, call;
USER_OBJECT_ ans;
int addContext = 0;
if(context && TYPEOF(fun) == CLOSXP && OBJECT(fun) && R_isInstanceOf(fun, XML_PARSE_CONTEXT_FUNCTION))
addContext = 1;
n = Rf_length(opArgs) + addContext + 1;
if(data)
n++;
if(n > 0) {
#if 1
PROTECT(call = allocVector(LANGSXP, n));
c = call;
SETCAR(call, fun); c = CDR(c);
if(addContext) {
SETCAR(c, R_makeXMLContextRef(context));
c = CDR(c);
}
for (i = 0; i < Rf_length(opArgs); i++) {
SETCAR(c, VECTOR_ELT(opArgs, i));
c = CDR(c);
}
if(data) {
SETCAR(c, data);
SET_TAG(c, Rf_install(".state"));
}
#else
PROTECT(c = call = allocList(n));
if(addContext) {
SETCAR(c, R_makeXMLContextRef(context));
c = CDR(c);
}
for (i = 0; i < GET_LENGTH(opArgs); i++) {
SETCAR(c, VECTOR_ELT(opArgs, i));
c = CDR(c);
}
if(data) {
SETCAR(c, data);
SET_TAG(c, Rf_install(".state"));
}
call = LCONS(fun, call);
UNPROTECT(1);
#endif
} else {
PROTECT(call = allocVector(LANGSXP, 1 + addContext));
SETCAR(call, fun);
if(addContext)
SETCAR(CDR(call), R_makeXMLContextRef(context));
}
ans = eval(call, R_GlobalEnv);
UNPROTECT(1);
return(ans);
}
USER_OBJECT_
RS_XML(RecursiveApply)(USER_OBJECT_ top, USER_OBJECT_ func, USER_OBJECT_ klasses)
{
USER_OBJECT_ ans;
PROTECT(top = duplicate(top));
ans = R_InternalRecursiveApply(top, func, klasses);
UNPROTECT(1);
return(ans);
}
USER_OBJECT_
R_InternalRecursiveApply(USER_OBJECT_ top, USER_OBJECT_ func, USER_OBJECT_ klasses)
{
int CHILD_NODE = 2, i;
USER_OBJECT_ kids;
int numChildren;
USER_OBJECT_ args, tmp;
if(GET_LENGTH(top) > CHILD_NODE) {
kids = VECTOR_ELT(top, CHILD_NODE);
numChildren = GET_LENGTH(kids);
/* Do the children first. */
PROTECT(args = NEW_LIST(1));
PROTECT(tmp = NEW_LIST(numChildren));
for(i = 0; i < numChildren; i++) {
SET_VECTOR_ELT(tmp, i, R_InternalRecursiveApply(VECTOR_ELT(kids, i), func, klasses));
}
SET_VECTOR_ELT(top, CHILD_NODE, tmp);
UNPROTECT(2);
}
PROTECT(args = NEW_LIST(1));
SET_VECTOR_ELT(args, 0, top);
tmp = RS_XML(invokeFunction)(func, args, NULL, NULL); /*XXX get the context and user data!!! */
UNPROTECT(1);
return(tmp);
}
USER_OBJECT_
RS_XML_SubstituteEntitiesDefault(USER_OBJECT_ replaceEntities)
{
int value;
USER_OBJECT_ ans;
value = xmlSubstituteEntitiesDefault(LOGICAL_DATA(replaceEntities)[0]);
ans = NEW_LOGICAL(1);
LOGICAL_DATA(ans)[0] = value;
return(ans);
}
#include
/* Simple macro for expanding ENTRY(x, n) to {"", (DL_FUNC) &, } */
#define ENTRY(name, n) { #name, (DL_FUNC) &name, n }
static R_CallMethodDef callMethods[] = {
ENTRY(RS_XML_RecursiveApply, 3),
#ifdef UNUSED_DOT_CALLS
ENTRY(RS_XML_HtmlParseTree, 7),
ENTRY(RS_XML_setDoc, 2),
ENTRY(R_xmlNsAsCharacter, 1),
ENTRY(R_addXMLNodeFinalizer, 1),
#endif
ENTRY(RS_XML_getDTD, 5),
ENTRY(RS_XML_libxmlVersion, 0),
ENTRY(RS_XML_Parse, 18),
ENTRY(RS_XML_ParseTree, 21),
ENTRY(R_newXMLDtd, 5),
ENTRY(R_newXMLDoc, 3),
ENTRY(R_newXMLNode, 6),
ENTRY(R_newXMLTextNode, 3),
ENTRY(R_xmlNewComment, 3),
ENTRY(R_newXMLCDataNode, 3),
ENTRY(R_newXMLPINode, 4),
ENTRY(R_xmlNewNs, 3),
ENTRY(R_xmlSetNs, 3),
ENTRY(R_xmlRootNode, 3),
ENTRY(R_insertXMLNode, 4),
ENTRY(R_saveXMLDOM, 6),
ENTRY(R_xmlCatalogResolve, 3),
ENTRY(RS_XML_xmlNodeNumChildren, 1),
ENTRY(RS_XML_unsetDoc, 4),
ENTRY(RS_XML_printXMLNode, 6),
ENTRY(RS_XML_dumpHTMLDoc, 5),
ENTRY(RS_XML_removeChildren, 3),
ENTRY(RS_XML_clone, 3),
ENTRY(RS_XML_addNodeAttributes, 2),
ENTRY(RS_XML_removeNodeAttributes, 3),
ENTRY(RS_XML_getNsList, 2),
ENTRY(RS_XML_setNodeName, 2),
ENTRY(RS_XML_SubstituteEntitiesDefault, 1),
ENTRY(RS_XML_getNextSibling, 3),
ENTRY(R_getXMLNodeDocument, 1),
ENTRY(RS_XML_createDocFromNode, 1),
ENTRY(R_removeInternalNode, 2),
ENTRY(RS_XML_replaceXMLNode, 3),
ENTRY(RS_XML_xmlAddSiblingAt, 4),
ENTRY(RS_XML_loadCatalog, 1),
ENTRY(RS_XML_clearCatalog, 0),
ENTRY(RS_XML_catalogAdd, 3),
ENTRY(RS_XML_catalogDump, 1),
ENTRY(RS_XML_setDocumentName, 2),
ENTRY(RS_XML_setKeepBlanksDefault, 1),
ENTRY(R_getDocEncoding, 1),
ENTRY(R_getLineNumber, 1),
ENTRY(RS_XML_xpathEval, 9),
ENTRY(RS_XML_xmlNodeChildrenReferences, 3),
ENTRY(RS_XML_freeDoc, 1),
ENTRY(RS_XML_setRootNode, 2),
ENTRY(R_getNodeChildByIndex, 3),
ENTRY(RS_XML_setDocEl, 2),
ENTRY(RS_XML_isDescendantOf, 3),
ENTRY(RS_XML_getStructuredErrorHandler, 0),
ENTRY(RS_XML_setStructuredErrorHandler, 1),
ENTRY(R_convertDOMToHashTree, 4),
ENTRY(R_parseURI, 1),
ENTRY(R_getXMLFeatures, 0),
ENTRY(R_xmlReadMemory, 5), //XXX
ENTRY(R_xmlReadFile, 3), //XXX
ENTRY(RS_XML_internalNodeNamespaceDefinitions, 2),
ENTRY(R_libxmlTypeTable_names, 2),
ENTRY(R_libxmlTypeTable_lookup, 3),
ENTRY(RS_XML_xmlSchemaValidateDoc, 4),
ENTRY(R_XML_indexOfChild, 1),
ENTRY(RS_XML_xmlStopParser, 1),
ENTRY(R_clearNodeMemoryManagement, 1),
ENTRY(R_XMLInternalDocument_free, 1),
ENTRY(R_addXMLInternalDocument_finalizer, 2),
ENTRY(R_createXMLNode, 4),
ENTRY(RS_XML_xmlNodeName, 1),
ENTRY(RS_XML_xmlNodeNamespace, 1),
ENTRY(RS_XML_xmlNodeAttributes, 3),
// ENTRY(RS_XML_xmlNodeChildrenReferences, 3), // duplicate
ENTRY(R_xmlNodeValue, 3),
ENTRY(R_setXMLInternalTextNode_value, 2),
ENTRY(RS_XML_xmlNodeParent, 2),
ENTRY(R_getXMLNsRef, 1), // XXX
ENTRY(R_setXMLInternalTextNode_noenc, 1),
ENTRY(R_isNodeChildOfAt, 3),
ENTRY(R_findXIncludeStartNodes, 2),
ENTRY(RS_XML_removeAllNodeNamespaces, 1),
ENTRY(RS_XML_removeNodeNamespaces, 2),
ENTRY(R_matchNodesInList, 3),
ENTRY(RS_XML_copyNodesToDoc, 3),
ENTRY(RS_XML_getDocumentName, 1),
ENTRY(RS_XML_getDefaultValiditySetting, 1),
ENTRY(RS_XML_xmlXIncludeProcessFlags, 2),
ENTRY(RS_XML_xmlXIncludeProcessTreeFlags, 2),
ENTRY(R_convertXMLNsRef, 1),
ENTRY(R_replaceNodeWithChildren, 1),
{NULL, NULL, 0}
};
static R_CMethodDef cmethods[] = {
ENTRY(RSXML_setErrorHandlers, 0),
ENTRY(xmlInitializeCatalog, 0),
{NULL, NULL, 0}
};
void
R_init_XML(DllInfo *dll)
{
R_useDynamicSymbols(dll, FALSE);
R_registerRoutines(dll, cmethods, callMethods, NULL, NULL);
}
Rboolean
R_isInstanceOf(USER_OBJECT_ obj, const char *klass)
{
USER_OBJECT_ klasses;
int n, i;
klasses = GET_CLASS(obj);
n = GET_LENGTH(klasses);
for(i = 0; i < n ; i++) {
if(strcmp(CHAR_DEREF(STRING_ELT(klasses, i)), klass) == 0)
return(TRUE);
}
return(FALSE);
}
SEXP
RS_XML_getStructuredErrorHandler(void)
{
SEXP ans;
PROTECT(ans = NEW_LIST(2));
SET_VECTOR_ELT(ans, 0, R_MakeExternalPtr(xmlGenericErrorContext, Rf_install("xmlGenericErrorContext"), R_NilValue));
SET_VECTOR_ELT(ans, 1, R_MakeExternalPtr((void *)xmlStructuredError, Rf_install("xmlStructuredErrorFunc"), R_NilValue));
UNPROTECT(1);
return(ans);
}
SEXP
RS_XML_setStructuredErrorHandler(SEXP els)
{
void *ctx;
xmlStructuredErrorFunc handler;
SEXP fun, sym;
fun = VECTOR_ELT(els, 0);
sym = VECTOR_ELT(els, 1);
if(sym != R_NilValue && TYPEOF(sym) != EXTPTRSXP) {
Rf_error("invalid symbol object for XML error handler. Need an external pointer, e.g from getNativeSymbolInfo");
}
if(fun == R_NilValue)
ctx = NULL;
else if(TYPEOF(fun) == EXTPTRSXP)
ctx = R_ExternalPtrAddr(fun);
else {
ctx = fun = Rf_duplicate(fun); /* Should R_PreserveObject and
* ReleaseObject() but then we have
to be able "remember" if it is an
R function or not.*/
R_PreserveObject(fun);
}
handler = (sym == R_NilValue) ? NULL : (xmlStructuredErrorFunc) R_ExternalPtrAddr(sym);
xmlSetStructuredErrorFunc(ctx, handler);
return(ScalarLogical(TRUE));
}
SEXP
CreateCharSexpWithEncoding(const xmlChar *encoding, const xmlChar *str)
{
SEXP ans;
#ifdef HAVE_R_CETYPE_T
cetype_t enc = CE_NATIVE;
if(encoding == (const xmlChar *) NULL || xmlStrcmp(encoding, (const xmlChar *) "")) {
enc = CE_NATIVE;
} else if(xmlStrcmp(encoding, (xmlChar *)"UTF-8") == 0 || xmlStrcmp(encoding, (xmlChar *)"utf-8") == 0)
enc = CE_UTF8;
else if(xmlStrcmp(encoding, (xmlChar *)"ISO-8859-1") == 0 || xmlStrcmp(encoding, (xmlChar *)"iso-8859-1") == 0)
enc = CE_LATIN1;
else {
str = (xmlChar *)translateChar(mkChar((const char *) str));
}
// REprintf("encoding: %d\n", enc);
ans = mkCharCE((const char *) str, enc);
#else
ans = mkChar((const char *) str);
#endif
return(ans);
}
SEXP
R_lookString(SEXP rstr)
{
const char *str;
str = CHAR(STRING_ELT(rstr, 0));
return(ScalarInteger((int) strlen(str)));
}
#if 0
#include
SEXP
R_relativeURL(SEXP r_url, SEXP r_base)
{
xmlChar *url, *base;
const xmlChar *ans;
SEXP rans;
url = CHAR_DEREF(STRING_ELT(r_url, 0));
base = CHAR_DEREF(STRING_ELT(r_base, 0));
ans = xmlBuildRelativeURI(url, base);
rans = ScalarString(COPY_TO_USER_STRING(ans));
xmlFree(ans);
return(rans);
}
#endif
XML/src/schema.c 0000644 0001751 0000144 00000014123 14106741723 013131 0 ustar hornik users #include "RS_XML.h"
#include "RSCommon.h"
#include
#include
#include
#include "Utils.h"
#if 0
#define R_GET_EXTERNAL_REF(type, name) \
type \
name(SEXP obj) \
{ \
SEXP ref = GET_SLOT(obj, Rf_install("ref")); \
if(TYPEOF(ref) != EXTPTRSXP) { \
Rf_error("Expected external pointer object"); \
} \
\
if(R_ExternalPtrTag(ref) != Rf_install(#type)) { \
Rf_error("Expected external pointer to have internal tag %s, got %s", \
#type, PRINTNAME(ref)); \
} \
\
return((type) R_ExternalPtrAddr(ref)); \
}
R_GET_EXTERNAL_REF(xmlHashTablePtr, R_libxmlTypeTableGetRef)
R_GET_EXTERNAL_REF(xmlSchemaElementPtr, R_libxmlSchemaElementGetRef)
#endif
void *
R_getExternalRef(SEXP obj, const char *className)
{
SEXP ref = GET_SLOT(obj, Rf_install("ref"));
void *ans;
if(TYPEOF(ref) != EXTPTRSXP) {
Rf_error("Expected external pointer object");
}
if(className && R_ExternalPtrTag(ref) != Rf_install(className)) {
Rf_error("Expected external pointer to have internal tag %s, got %s",
className, CHAR(PRINTNAME(R_ExternalPtrTag(ref))) );
}
ans = R_ExternalPtrAddr(ref);
if(!ans) {
Rf_error("Got NULL value in reference for %s", className);
}
return(ans);
}
typedef struct {
int pos;
USER_OBJECT_ els;
USER_OBJECT_ names;
char *elType;
} HashGatherer;
#if LIBXML_VERSION >= 20908
# define CONST const
#else
# define CONST
#endif
static void
getKeys(void *el, void *data, CONST xmlChar *name)
{
HashGatherer *d = (HashGatherer *)data;
SET_STRING_ELT(d->names, d->pos, COPY_TO_USER_STRING(name));
if(d->elType) {
SET_VECTOR_ELT(d->els, d->pos, R_makeRefObject(el, d->elType));
}
d->pos++;
}
USER_OBJECT_
R_libxmlTypeTable_names(USER_OBJECT_ table, USER_OBJECT_ s_elType)
{
xmlHashTablePtr t;
int n = 0, ctr = 0;
int getElements = GET_LENGTH(s_elType) > 0;
HashGatherer d = {0, NULL_USER_OBJECT, NULL_USER_OBJECT, NULL};
t = R_getExternalRef(table, NULL); /* R_libxmlTypeTableGetRef(table); */
n = xmlHashSize(t);
PROTECT(d.names = NEW_CHARACTER(n)); ctr++;
if(getElements) {
PROTECT(d.els = NEW_LIST(n)); ctr++;
d.elType = (char *) CHAR_DEREF(STRING_ELT(s_elType, 0));
}
xmlHashScan(t, getKeys, &d);
if(getElements)
SET_NAMES(d.els, d.names);
else
d.els = d.names;
UNPROTECT(ctr);
return(d.els);
}
USER_OBJECT_
R_libxmlTypeTable_lookup(USER_OBJECT_ table, USER_OBJECT_ name, USER_OBJECT_ s_elType)
{
xmlHashTablePtr t;
USER_OBJECT_ ans;
void *p;
t = R_getExternalRef(table, NULL); /* R_libxmlTypeTableGetRef(table); */
p = xmlHashLookup(t, (const xmlChar *)CHAR_DEREF(STRING_ELT(name, 0)));
ans = R_makeRefObject(p, CHAR_DEREF(STRING_ELT(s_elType, 0)));
return(ans);
}
#define SchemaElement(id, type) \
USER_OBJECT_ \
R_libxmlTypeTable_##id(USER_OBJECT_ s) \
{ \
xmlSchemaPtr schema; \
schema = R_getExternalRef(s, "xmlSchemaRef"); \
\
return(schema->id != NULL ? R_makeRefObject(schema->id, type) : R_NilValue); \
}
SchemaElement(elemDecl, "SchemaElementTable")
SchemaElement(typeDecl, "SchemaTypeTable")
SchemaElement(attrDecl, "SchemaAttributeTable")
SchemaElement(attrgrpDecl, "SchemaAttributeGroupTable")
SchemaElement(notaDecl, "SchemaNotationTable")
/*
USER_OBJECT_
R_libxmlTypeTable_elemDecl(USER_OBJECT_ s)
{
xmlSchemaPtr schema;
schema = R_getExternalRef(s, "xmlSchemaRef");
return(R_makeRefObject(schema->typeDecl, "SchemaElementTable"));
}
*/
#include
typedef struct {
SEXP fun;
} R_SchemaValidCallback;
void
R_schemaValidityFunctionCall(R_SchemaValidCallback *ctx, int warning, const char *msg, va_list args)
{
SEXP arg;
char buf[10000];
vsnprintf(buf, sizeof(buf)/sizeof(buf[0]), msg, args);
PROTECT(arg = mkString(buf));
SET_CLASS(arg, mkString(warning ? "XMLSchemaWarning" : "XMLSchemaError"));
SETCAR(CDR(ctx->fun), arg);
Rf_eval(ctx->fun, R_GlobalEnv);
UNPROTECT(1);
}
void
R_schemaValidityErrorFunc(R_SchemaValidCallback *ctx, const char *msg, ...)
{
va_list args;
va_start(args, msg);
R_schemaValidityFunctionCall(ctx, 0, msg, args);
va_end(args);
}
void
R_schemaValidityWarningFunc(R_SchemaValidCallback *ctx, const char *msg, ...)
{
va_list args;
va_start(args, msg);
R_schemaValidityFunctionCall(ctx, 1, msg, args);
va_end(args);
}
SEXP
RS_XML_xmlSchemaValidateDoc(SEXP r_schema, SEXP r_doc, SEXP r_options, SEXP r_errorHandlers)
{
xmlSchemaValidCtxtPtr ctxt;
xmlDocPtr doc = (xmlDocPtr) R_ExternalPtrAddr(r_doc);
xmlSchemaPtr schema = (xmlSchemaPtr) R_ExternalPtrAddr(r_schema);
// int nprot = 0;
// ctxt = (xmlSchemaValidCtxtPtr) R_ExternalPtrAddr(r_ctxt);
int status;
int numErrHandlers;
ctxt = xmlSchemaNewValidCtxt(schema);
if(LENGTH(r_options))
xmlSchemaSetValidOptions(ctxt, INTEGER(r_options)[0]);
numErrHandlers = Rf_length(r_errorHandlers);
if(numErrHandlers > 0) {
R_SchemaValidCallback cbinfo;
PROTECT(cbinfo.fun = allocVector(LANGSXP, 2));
SETCAR(cbinfo.fun, VECTOR_ELT(r_errorHandlers, 0));
xmlSchemaSetValidErrors(ctxt, (xmlSchemaValidityErrorFunc) R_schemaValidityErrorFunc,
(xmlSchemaValidityWarningFunc) R_schemaValidityWarningFunc, &cbinfo);
}
status = xmlSchemaValidateDoc(ctxt, doc);
xmlSchemaFreeValidCtxt(ctxt); /* R_alloc this if possible. */
if(numErrHandlers > 0) UNPROTECT(1);
return(ScalarInteger(status));
}
#if 0
SEXP
RS_XML_xmlSchemaNewValidCtxt(SEXP r_schema, SEXP r_options, SEXP r_errorHandlers)
{
xmlSchemaPtr schema = (xmlSchemaPtr) R_ExternalPtrAddr(r_schema);
xmlSchemaValidCtxtPtr ctxt;
int numErrHandlers;
ctxt = xmlSchemaNewValidCtxt(schema);
if(LENGTH(r_options))
xmlSchemaSetValidOptions(ctxt, INTEGER(r_options)[0]);
numErrHandlers = LENGTH(r_errorHandlers);
if(numErrHandlers > 0) {
R_SchemaValidCallback *cbinfo = (R_SchemaValidCallback*) malloc(sizeof(R_SchemaValidCallback));
cbinfo->fun = VECTOR_ELT(r_errorHandlers);
xmlSchemaSetValidErrors(routine);
}
return();
}
#endif
XML/src/XMLEventParse.c 0000644 0001751 0000144 00000063036 14553462406 014341 0 ustar hornik users #include "EventParse.h"
#include "DocParse.h"
#define R_USE_XML_ENCODING 1
#include "Utils.h"
#include
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
#ifdef _WIN32
/* on Unix, there is a configure check */
# if LIBXML_VERSION < 20627 || LIBXML_VERSION > 21100
# define NO_CHECKED_ENTITY_FIELD
# endif
#endif
static USER_OBJECT_ createSAX2AttributesList(const xmlChar **attributes, int nb_attributes, int nb_defaulted, const xmlChar *encoding);
/*
This is an event driven parsing implementation for R & S
using the libxml (http://xmlsoft.org) rather than Jim Clark's expat.
It works much the same way, but has some advantages
a) only one library need be installed for both document
and event parsing
b) the libxml tools can read data via ftp and http.
Both expat and libxml provide the SAX interface and allow us to share
a great deal of code between the two event parser implementations
within this package.
*/
void RS_XML(startElementHandler)(void *ctx, const xmlChar *name, const xmlChar **atts);
void RS_XML(commentElementHandler)(void *ctx, const xmlChar *val);
void RS_XML(charactersHandler)(void *user_data, const xmlChar *ch, int len);
void RS_XML(endElementHandler)(void *ctx, const xmlChar *name);
void RS_XML(startDocumentHandler)(void *ctx);
void RS_XML(endDocumentHandler)(void *ctx);
void RS_XML(cdataBlockHandler)(void *ctx, const xmlChar *value, int len);
void RS_XML(piHandler)(void *ctx, const xmlChar *target, const xmlChar *data);
void RS_XML(entityDeclaration)(void *ctx, const xmlChar *name, int type, const xmlChar *publicId,
const xmlChar *systemId, xmlChar *content);
xmlEntityPtr RS_XML(getEntityHandler)(void *userData, const xmlChar *name);
xmlEntityPtr RS_XML(getParameterEntityHandler)(void *userData, const xmlChar *name);
int RS_XML(isStandAloneHandler)(void *ctx);
void RS_XML(warningHandler)(void *ctx, const char *msg, ...);
void RS_XML(errorHandler)(void *ctx, const char *format, ...);
void RS_XML(fatalErrorHandler)(void *ctx, const char *msg, ...);
#if LIBXML_VERSION < 21200
void RS_XML(structuredErrorHandler)(void *ctx, xmlErrorPtr err);
#else
void RS_XML(structuredErrorHandler)(void *ctx, const struct _xmlError *err);
#endif
static void RS_XML(initXMLParserHandler)(xmlSAXHandlerPtr xmlParserHandler, int saxVersion);
USER_OBJECT_
createSAX2AttributesList(const xmlChar **attributes, int nb_attributes, int nb_defaulted, const xmlChar *encoding)
{
int i;
const char **ptr;
USER_OBJECT_ attr_names;
USER_OBJECT_ attr_values;
USER_OBJECT_ nsURI, nsNames;
if(nb_attributes < 1)
return(NULL_USER_OBJECT);
PROTECT(attr_values = NEW_CHARACTER(nb_attributes));
PROTECT(attr_names = NEW_CHARACTER(nb_attributes));
PROTECT(nsURI = NEW_CHARACTER(nb_attributes));
PROTECT(nsNames = NEW_CHARACTER(nb_attributes));
ptr = (const char **) attributes; /*XXX */
for(i=0; i < nb_attributes; i++, ptr+=5) {
char *tmp;
int len;
len = (int)(ptr[4] - ptr[3] + 1);
tmp = malloc(sizeof(char) * len);
if(!tmp) {
Rf_error("Cannot allocate space for attribute of length %d", (int) (ptr[4] - ptr[3] + 2));
}
memcpy(tmp, ptr[3], ptr[4] - ptr[3]);
tmp[len-1] = '\0'; /*XXX*/
SET_STRING_ELT(attr_values, i, ENC_COPY_TO_USER_STRING(tmp));
free(tmp);
SET_STRING_ELT(attr_names, i, ENC_COPY_TO_USER_STRING(ptr[0]));
if(ptr[2]) {
SET_STRING_ELT(nsURI, i, ENC_COPY_TO_USER_STRING(ptr[2]));
if(ptr[1])
SET_STRING_ELT(nsNames, i, ENC_COPY_TO_USER_STRING(ptr[1]));
}
}
SET_NAMES(nsURI, nsNames);
SET_NAMES(attr_values, attr_names);
Rf_setAttrib(attr_values, Rf_install("namespaces"), nsURI);
UNPROTECT(4);
return(attr_values);
}
#ifdef NEED_CLOSE_CALLBACK
/* Is this actually needed? We can ensure that all errors
are caught by R and so ensure that we close things.
*/
int
RS_XML_closeConnectionInput(void *context)
{
int status;
status = RS_XML_readConnectionInput(context, NULL, -1);
return(1);
}
#endif
typedef struct {
SEXP fun;
xmlParserCtxtPtr ctx;
} RFunCtxData;
int
RS_XML_readConnectionInput(void *context, char *buffer, int len)
{
SEXP e, tmp, arg = R_NilValue /* -Wall */;
int n;
int errorOccurred;
const char *str;
int left = len-1, count;
int nprot = 0;
#ifdef R_XML_DEBUG
char *orig = buffer;
#endif
SEXP fun;
xmlParserCtxtPtr ctx;
#ifndef LIBXML2_NEW_BUFFER
ctx = (xmlParserCtxtPtr) context;
fun = ctx->_private;
#else
RFunCtxData *user = (RFunCtxData *) context;
ctx = user->ctx;
fun = user->fun;
#endif
if(len == -1)
return(0);
/* Setup the expression to call the user-supplied R function or call readLines(con, 1)
if they gave us a connection. */
if(isFunction(fun)) {
/* Invoke the user-provided function to get the next line. */
PROTECT(e = allocVector(LANGSXP, 2)); nprot++;
SETCAR(e, fun);
PROTECT(arg = NEW_INTEGER(1)); nprot++;
INTEGER_DATA(arg)[0] = len;
SETCAR(CDR(e), arg);
} else
e = fun;
n = count = 0;
while(n == 0 && left > 0) {
str = NULL;
/* Update the argument to the user-defined function to say how much is left. */
if(isFunction(fun))
INTEGER_DATA(arg)[0] = left;
tmp = R_tryEval(e, R_GlobalEnv, &errorOccurred);
if(errorOccurred || !IS_CHARACTER(tmp)) {
UNPROTECT(nprot);
if ((ctx->sax != NULL) && (ctx->sax->error != NULL)) /* throw an XML error. */
ctx->sax->error(ctx->userData, "Failed to call read on XML connection");
return(-1);
}
if(GET_LENGTH(tmp)) {
str = CHAR_DEREF(STRING_ELT(tmp, 0));
n = (int)strlen(str);
if(n != 0) { /* Just add a new line and do it again. */
if(n > left) {
Rf_warning("string read from XML connection too long for buffer: truncating %s to %d characters", str, left);
}
strncpy(buffer, str, left);
left -= n;
count += n ;
}
} else {
/* Notice that we may have actually added something to the
buffer, specifically a sequence of empty lines \n,
and these will be discarded and not passed to the XML parser
but these are extraneous anyway. Are they?
*/
n = count = 0;
break;
}
}
#ifdef R_XML_DEBUG
fprintf(stderr, "size (len = %d, n=%d, count=%d)\nbuffer= '%s'\nRstring='%s'\n", len, n, count, buffer, str);fflush(stderr);
/* fprintf(stderr, "size (n=%d, count=%d) %s '%s'\n", n, count, str, orig);fflush(stderr); */
#endif
UNPROTECT(nprot);
return(count);
/* return(count == 0 ? -1 : count); */
}
xmlParserCtxtPtr
RS_XML_xmlCreateConnectionParserCtxt(USER_OBJECT_ con)
{
xmlParserInputBufferPtr buf;
xmlParserCtxtPtr ctx = NULL;
#ifdef LIBXML2
ctx = xmlNewParserCtxt();
#ifndef LIBXML2_NEW_BUFFER // < 2.9.1
ctx->_private = (USER_OBJECT_) con;
/* R_chk_calloc */
buf = (xmlParserInputBufferPtr) calloc(1, sizeof(xmlParserInputBuffer));
buf->readcallback = RS_XML_readConnectionInput;
buf->context = (void*) ctx;
buf->raw = NULL; /* buf->buffer; */
xmlBufferPtr tmp = xmlBufferCreate();
buf->buffer = tmp;
#else
RFunCtxData *userData = (RFunCtxData *) R_alloc(sizeof(RFunCtxData), 1);
userData->fun = con;
userData->ctx = ctx;
buf = xmlParserInputBufferCreateIO(RS_XML_readConnectionInput, NULL, userData, XML_CHAR_ENCODING_NONE);
#endif
xmlParserInputPtr input = xmlNewIOInputStream(ctx, buf, XML_CHAR_ENCODING_NONE);
if(!input) {
Rf_error("can't create new IOInputStream");
}
inputPush(ctx, input);
#endif
return(ctx);
}
int
RS_XML(libXMLEventParse)(const char *fileName, RS_XMLParserData *parserData, RS_XML_ContentSourceType asText,
int saxVersion, USER_OBJECT_ r_encoding)
{
xmlSAXHandlerPtr xmlParserHandler;
xmlParserCtxtPtr ctx;
int status;
switch(asText) {
case RS_XML_TEXT:
ctx = xmlCreateDocParserCtxt(CHAR_TO_XMLCHAR(fileName));
break;
case RS_XML_FILENAME:
ctx = xmlCreateFileParserCtxt(fileName);
break;
case RS_XML_CONNECTION:
ctx = RS_XML_xmlCreateConnectionParserCtxt((USER_OBJECT_) fileName);
break;
default:
ctx = NULL;
}
if(ctx == NULL) {
Rf_error("Can't parse %s", fileName);
}
xmlParserHandler = (xmlSAXHandlerPtr) S_alloc(sizeof(xmlSAXHandler), 1);
/* Make certain this is initialized so that we don't have any references to unwanted routines! */
memset(xmlParserHandler, '\0', sizeof(xmlSAXHandler));
RS_XML(initXMLParserHandler)(xmlParserHandler, saxVersion);
parserData->ctx = ctx;
ctx->userData = parserData;
ctx->sax = xmlParserHandler;
if(Rf_length(r_encoding) && STRING_ELT(r_encoding, 0) != R_NaString) {
// Rf_PrintValue(r_encoding);
ctx->encoding = xmlStrdup((const xmlChar *)CHAR(STRING_ELT(r_encoding, 0)));
}
status = xmlParseDocument(ctx);
ctx->sax = NULL;
xmlFreeParserCtxt(ctx);
return(status);
/* Free(xmlParserHandler); */
}
int
R_isBranch(const xmlChar *localname, RS_XMLParserData *rinfo)
{
int n;
if(rinfo->current)
return(-2); /* we are processing a branch */
if((n = GET_LENGTH(rinfo->branches)) > 0) {
int i;
USER_OBJECT_ names = GET_NAMES(rinfo->branches);
for(i = 0 ; i < n ; i++) {
if(strcmp(XMLCHAR_TO_CHAR(localname), CHAR_DEREF(STRING_ELT(names, i))) == 0) {
return(i);
}
}
}
return(-1);
}
char *
getPropertyValue(const xmlChar **ptr)
{
int len;
char *tmp;
len = (int)(ptr[4] - ptr[3] + 1);
tmp = malloc(sizeof(char) * len);
if(!tmp) {
Rf_error("Cannot allocate space for attribute of length %d", (int) (ptr[4] - ptr[3] + 2));
}
memcpy(tmp, ptr[3], ptr[4] - ptr[3]);
tmp[len-1] = '\0'; /*XXX*/
return(tmp);
}
void
R_processBranch(RS_XMLParserData * rinfo,
int branchIndex,
const xmlChar * localname,
const xmlChar * prefix,
const xmlChar * URI,
int nb_namespaces,
const xmlChar ** namespaces,
int nb_attributes,
int nb_defaulted,
const xmlChar ** attributes,
Rboolean sax1)
{
xmlNodePtr node;
node = xmlNewNode(NULL, localname);
if(attributes) {
const xmlChar ** p = attributes;
int i;
if(sax1) {
for(i = 0; *p ; i += 2, p += 2)
xmlSetProp(node, p[0], p[1]); /*??? Do we need to xmlStrdup() this. */
} else {
const xmlChar **ptr = p;
for(i = 0; i < nb_attributes; i++, ptr += 5) {
/*XXX does this get freed later on?*/
xmlSetProp(node, xmlStrdup(ptr[0]), (const xmlChar *)getPropertyValue(ptr));
}
}
}
if(rinfo->current) {
/* Add to children */
xmlAddChild(rinfo->current, node);
} else {
rinfo->top = node;
rinfo->branchIndex = branchIndex;
}
rinfo->current = node;
}
void
R_xmlFreeNode(SEXP node)
{
xmlNodePtr p;
p = R_ExternalPtrAddr(node);
if(p) {
xmlFreeNode(p);
#ifdef R_XML_DEBUG
fprintf(stderr, "Freeing XML node from a branch\n");
#endif
}
R_SetExternalPtrAddr(node, NULL_USER_OBJECT);
}
int numDocsCreated = 0;
void
R_reportDocGC(void)
{
REprintf("\n", numDocsCreated, R_numXMLDocs, R_numXMLDocsFreed);
}
void
R_endBranch(RS_XMLParserData *rinfo,
const xmlChar * localname,
const xmlChar * prefix,
const xmlChar * URI)
{
if(rinfo->current) {
xmlNodePtr tmp;
xmlDocPtr doc = NULL;
tmp = rinfo->current;
if(tmp->parent == NULL) {
/* Call the function with the given node.*/
SEXP fun, args;
USER_OBJECT_ rnode;
if(rinfo->dynamicBranchFunction)
fun = rinfo->dynamicBranchFunction;
else {
fun = VECTOR_ELT(rinfo->branches, rinfo->branchIndex);
}
PROTECT(args = NEW_LIST(1));
if(tmp->doc == NULL) {
doc = xmlNewDoc((const xmlChar*) "1.0");
initDocRefCounter(doc);
xmlDocSetRootElement(doc, tmp);
/* fprintf(stderr, "\n", doc); */
numDocsCreated++;
}
SET_VECTOR_ELT(args, 0, rnode = R_createXMLNodeRef(tmp, rinfo->finalize));
RS_XML(invokeFunction)(fun, args, NULL, rinfo->ctx);
UNPROTECT(1);
/*
xmlFreeNode(rinfo->top);
rinfo->top = NULL;
*/
#if 0
fprintf(stderr, "Finishing branch for %s %s\n", tmp->name, tmp->properties->children->content);
#endif
/* if(rinfo->dynamicBranchFunction)
R_ReleaseObject(rinfo->dynamicBranchFunction);
*/
}
rinfo->current = rinfo->current->parent;
if(rinfo->current && (rinfo->current->type == XML_DOCUMENT_NODE || rinfo->current->type == XML_HTML_DOCUMENT_NODE))
rinfo->current = NULL;
}
}
static int
isBranchFunction(SEXP obj)
{
int i, n;
SEXP classes;
if(TYPEOF(obj) != CLOSXP)
return(0);
classes = GET_CLASS(obj);
n = GET_LENGTH(classes);
for(i = 0; i < n; i++)
if(strcmp(CHAR(STRING_ELT(classes, i)), "SAXBranchFunction") == 0)
return(1);
return(0);
}
static void
RS_XML(xmlSAX2StartElementNs)(void * userData,
const xmlChar * localname,
const xmlChar * prefix,
const xmlChar * URI,
int nb_namespaces,
const xmlChar ** namespaces,
int nb_attributes,
int nb_defaulted,
const xmlChar ** attributes)
{
int i, n;
USER_OBJECT_ tmp, names;
USER_OBJECT_ opArgs, ans;
RS_XMLParserData *rinfo = (RS_XMLParserData*) userData;
DECL_ENCODING_FROM_EVENT_PARSER(rinfo)
if(!localname)
return;
/* if there is a branch function in the branches argument of xmlEventParse() with this name, call that and return.*/
if((i = R_isBranch(localname, rinfo)) != -1) {
R_processBranch(rinfo, i, localname, prefix, URI, nb_namespaces, namespaces, nb_attributes, nb_defaulted, attributes, FALSE);
return;
}
PROTECT(opArgs = NEW_LIST(4));
SET_VECTOR_ELT(opArgs, 0, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 0), 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(localname)));
/* Now convert the attributes list. */
SET_VECTOR_ELT(opArgs, 1, createSAX2AttributesList(attributes, nb_attributes, nb_defaulted, encoding));
PROTECT(tmp = NEW_CHARACTER(1));
if(URI) {
SET_STRING_ELT(tmp, 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(URI)));
SET_NAMES(tmp, ScalarString(CreateCharSexpWithEncoding(encoding, ( (void*)prefix ? prefix : (const xmlChar *)""))));
}
SET_VECTOR_ELT(opArgs, 2, tmp);
UNPROTECT(1);
n = nb_namespaces;
PROTECT(tmp = NEW_CHARACTER(n));
PROTECT(names = NEW_CHARACTER(n));
for(i = 0, n = 0; n < nb_namespaces; n++, i+=2) {
SET_STRING_ELT(tmp, n, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(namespaces[i+1])));
if(namespaces[i])
SET_STRING_ELT(names, n, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(namespaces[i])));
}
SET_NAMES(tmp, names);
SET_VECTOR_ELT(opArgs, 3, tmp);
UNPROTECT(2);
ans = RS_XML(callUserFunction)(HANDLER_FUN_NAME(rinfo, "startElement"), XMLCHAR_TO_CHAR(localname), rinfo, opArgs);
/* If the handler function returned us a SAXBranchFunction function, then we need to build the node's sub-tree and
then invoke the function with that node as the main argument. (It may also get the context/parser.) */
if(isBranchFunction(ans)) {
/* Hold on to the function to avoid it being garbage collected. */
R_PreserveObject(rinfo->dynamicBranchFunction = ans);
/* Start the creation of the node's sub-tree. */
R_processBranch(rinfo, -1, localname, prefix, URI, nb_namespaces, namespaces, nb_attributes, nb_defaulted, attributes, FALSE);
}
UNPROTECT(1);
}
static void
RS_XML(xmlSAX2EndElementNs)(void * ctx,
const xmlChar * localname,
const xmlChar * prefix,
const xmlChar * URI)
{
USER_OBJECT_ args, tmp, fun;
RS_XMLParserData *rinfo = (RS_XMLParserData *) ctx;
DECL_ENCODING_FROM_EVENT_PARSER(rinfo)
if(rinfo->current) {
R_endBranch(rinfo, localname, prefix, URI);
return;
}
PROTECT(args = NEW_LIST(2));
SET_VECTOR_ELT(args, 0, ScalarString(ENC_COPY_TO_USER_STRING(localname)));
PROTECT(tmp = ScalarString(ENC_COPY_TO_USER_STRING(URI ? URI : (const xmlChar *)"")));
if(prefix)
SET_NAMES(tmp, ScalarString(ENC_COPY_TO_USER_STRING(prefix)));
SET_VECTOR_ELT(args, 1, tmp);
fun = findEndElementFun((const char *)localname, rinfo);
if(fun) {
USER_OBJECT_ val = PROTECT(RS_XML(invokeFunction)(fun, args, rinfo->stateObject, rinfo->ctx));
updateState(val, rinfo);
UNPROTECT(1);
} else
RS_XML(callUserFunction)(HANDLER_FUN_NAME(ctx, "endElement"), NULL, (RS_XMLParserData *)ctx, args);
UNPROTECT(2);
}
#if 0
static void
RS_XML(xmlSAX2StartDocument)(void *userData)
{
}
#endif
void
RS_XML(initXMLParserHandler)(xmlSAXHandlerPtr xmlParserHandler, int saxVersion)
{
if(saxVersion == 2) {
xmlParserHandler->initialized = 0;
xmlSAX2InitDefaultSAXHandler(xmlParserHandler, 0);
xmlParserHandler->initialized = XML_SAX2_MAGIC;
xmlParserHandler->startElementNs = RS_XML(xmlSAX2StartElementNs);
xmlParserHandler->endElementNs = RS_XML(xmlSAX2EndElementNs);
xmlParserHandler->startElement = NULL;
xmlParserHandler->endElement = NULL;
xmlParserHandler->serror = RS_XML(structuredErrorHandler);
} else {
xmlParserHandler->startElement = RS_XML(startElementHandler);
xmlParserHandler->endElement = RS_XML(endElementHandler);
}
xmlParserHandler->entityDecl = RS_XML(entityDeclaration);
xmlParserHandler->getEntity = RS_XML(getEntityHandler);
xmlParserHandler->comment = RS_XML(commentElementHandler);
xmlParserHandler->characters = RS_XML(charactersHandler);
xmlParserHandler->processingInstruction = RS_XML(piHandler);
xmlParserHandler->cdataBlock = RS_XML(cdataBlockHandler);
xmlParserHandler->startDocument = RS_XML(startDocumentHandler);
xmlParserHandler->endDocument = RS_XML(endDocumentHandler);
xmlParserHandler->isStandalone = RS_XML(isStandAloneHandler);
xmlParserHandler->fatalError = RS_XML(fatalErrorHandler);
xmlParserHandler->warning = RS_XML(warningHandler);
xmlParserHandler->error = RS_XML(errorHandler);
/* external entity */
xmlParserHandler->internalSubset = NULL;
xmlParserHandler->externalSubset = NULL;
xmlParserHandler->hasInternalSubset = NULL;
xmlParserHandler->hasExternalSubset = NULL;
xmlParserHandler->resolveEntity = NULL;
xmlParserHandler->getParameterEntity = RS_XML(getParameterEntityHandler);
xmlParserHandler->attributeDecl = NULL;
xmlParserHandler->elementDecl = NULL;
xmlParserHandler->notationDecl = NULL;
xmlParserHandler->unparsedEntityDecl = NULL;
xmlParserHandler->setDocumentLocator = NULL;
xmlParserHandler->reference = NULL;
xmlParserHandler->ignorableWhitespace = NULL;
}
void
RS_XML(startElementHandler)(void *userData, const xmlChar *name, const xmlChar **atts)
{
RS_XML(startElement)(userData, (const char *)name, (const char **)atts);
}
void
RS_XML(endElementHandler)(void *ctx, const xmlChar *name)
{
RS_XML(endElement)(ctx, (const char *)name);
}
void
RS_XML(commentElementHandler)(void *ctx, const xmlChar *val)
{
RS_XML(commentHandler)(ctx, (const XML_Char*)val);
}
void
RS_XML(charactersHandler)(void *user_data, const xmlChar *ch, int len)
{
RS_XML(textHandler)(user_data, (const XML_Char*)ch, len);
}
void
RS_XML(startDocumentHandler)(void *ctx)
{
RS_XML(callUserFunction)(HANDLER_FUN_NAME(ctx, "startDocument"), NULL, ((RS_XMLParserData*) ctx), NULL_USER_OBJECT);
}
void
RS_XML(endDocumentHandler)(void *ctx)
{
RS_XML(callUserFunction)(HANDLER_FUN_NAME(ctx, "endDocument"), NULL, ((RS_XMLParserData*) ctx), NULL_USER_OBJECT);
}
void
RS_XML(cdataBlockHandler)(void *ctx, const xmlChar *value, int len)
{
USER_OBJECT_ opArgs;
RS_XMLParserData *parserData = (RS_XMLParserData*) ctx;
DECL_ENCODING_FROM_EVENT_PARSER(parserData)
if(parserData->current) {
xmlAddChild(parserData->current, xmlNewCDataBlock(NULL, value, len));
return;
}
PROTECT(opArgs = NEW_LIST(1));
SET_VECTOR_ELT(opArgs, 0, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 0), 0, ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR(value)));
RS_XML(callUserFunction)(HANDLER_FUN_NAME(parserData, "cdata"), (const char *)NULL, (RS_XMLParserData*)ctx, opArgs);
UNPROTECT(1);
}
void
RS_XML(piHandler)(void *ctx, const xmlChar *target, const xmlChar *data)
{
RS_XML(processingInstructionHandler)(ctx, (const XML_Char*)target, (const XML_Char*)data);
}
//#define RString(x) (x ? mkString(XMLCHAR_TO_CHAR((x))) : NEW_CHARACTER(1))
#define RString(x) (x ? ScalarString(ENC_COPY_TO_USER_STRING(XMLCHAR_TO_CHAR((x)))) : NEW_CHARACTER(1))
/* Relies on the order and numbering of xmlEntityType from entities.h */
static const char * const EntityTypeNames[] = {
"Internal_General",
"External_General_Parsed",
"External_General_Unparsed",
"Internal_Parameter",
"External_Parameter",
"Internal_Predefined"
};
void
RS_XML(entityDeclaration)(void *ctx,
const xmlChar *name, int type, const xmlChar *publicId,
const xmlChar *systemId, xmlChar *content)
{
USER_OBJECT_ fun, opArgs, tmp;
RS_XMLParserData *parserData = (RS_XMLParserData*) ctx;
DECL_ENCODING_FROM_EVENT_PARSER(parserData)
/* check if there is a function to call before making the list of 5 elements. */
fun = RS_XML(findFunction)(HANDLER_FUN_NAME(parserData, "entityDeclaration"), parserData->methods);
if(fun == NULL || fun == NULL_USER_OBJECT)
return;
PROTECT(fun);
PROTECT(opArgs = NEW_LIST(5));
SET_VECTOR_ELT(opArgs, 0, RString(name));
PROTECT(tmp = ScalarInteger(type));
SET_NAMES(tmp, mkString(EntityTypeNames[type-1]));
SET_VECTOR_ELT(opArgs, 1, tmp);
UNPROTECT(1);
SET_VECTOR_ELT(opArgs, 2, RString(content));
SET_VECTOR_ELT(opArgs, 3, RString(systemId));
SET_VECTOR_ELT(opArgs, 4, RString(publicId));
(void) RS_XML(invokeFunction)(fun, opArgs, parserData->stateObject, parserData->ctx);
UNPROTECT(2);
}
static xmlEntityPtr
do_getEntityHandler(void *userData, const xmlChar *name, const char * r_funName)
{
SEXP opArgs, r_ans;
xmlEntityPtr ans = NULL;
RS_XMLParserData *parserData = (RS_XMLParserData*) userData;
DECL_ENCODING_FROM_EVENT_PARSER(parserData)
PROTECT(opArgs = NEW_LIST(1)) ;
SET_VECTOR_ELT(opArgs, 0, ScalarString(ENC_COPY_TO_USER_STRING(name))); /*XXX should we encode this? Done now! */
r_ans = RS_XML(callUserFunction)(r_funName, NULL, (RS_XMLParserData *) userData, opArgs);
PROTECT(r_ans) ;
if(r_ans != NULL_USER_OBJECT && GET_LENGTH(r_ans) > 0) {
if(TYPEOF(r_ans) == STRSXP) {
const char *value;
value = CHAR_DEREF(STRING_ELT(r_ans, 0));
ans = (xmlEntityPtr) malloc(sizeof(xmlEntity));
memset(ans, 0, sizeof(xmlEntity));
ans->type = XML_ENTITY_DECL;
ans->etype = XML_INTERNAL_GENERAL_ENTITY;
ans->name = xmlStrdup(name);
ans->orig = NULL; // xmlStrdup(CHAR_TO_XMLCHAR(value));
ans->content = xmlStrdup(CHAR_TO_XMLCHAR(value));
ans->length = (int)strlen(value);
#ifndef NO_CHECKED_ENTITY_FIELD
ans->checked = 1;
#endif
}
}
UNPROTECT(2);
return(ans);
}
xmlEntityPtr
RS_XML(getEntityHandler)(void *userData, const xmlChar *name)
{
return(do_getEntityHandler(userData, name, HANDLER_FUN_NAME(userData, "getEntity")));
}
xmlEntityPtr
RS_XML(getParameterEntityHandler)(void *userData, const xmlChar *name)
{
return(do_getEntityHandler(userData, name, HANDLER_FUN_NAME(userData, "getParameterEntity")));
}
int
RS_XML(isStandAloneHandler)(void *ctx)
{
return(1);
}
void
RS_XML(fatalErrorHandler)(void *ctx, const char *format, ...)
{
const char *msg = "error message unavailable";
va_list args;
va_start(args, format);
if(strcmp(format, "%s") == 0)
msg = va_arg(args, char *);
va_end(args);
Rf_error("Fatal error in the XML event driven parser for %s: %s",
((RS_XMLParserData*) ctx)->fileName, msg);
}
void
RS_XML(errorHandler)(void *ctx, const char *format, ...)
{
const char *msg = "error message unavailable";
va_list args;
va_start(args, format);
if(strcmp(format, "%s") == 0)
msg = va_arg(args, char *);
va_end(args);
Rf_error("Error in the XML event driven parser for %s: %s",
((RS_XMLParserData*) ctx)->fileName, msg);
}
// was RS_XML(structuredErrorHandler)(void *ctx, const xmlError err)
void
#if LIBXML_VERSION < 21200
RS_XML(structuredErrorHandler)(void *ctx, xmlErrorPtr err)
#else
RS_XML(structuredErrorHandler)(void *ctx, const struct _xmlError *err)
#endif
{
if(err->level == XML_ERR_FATAL) {
Rf_error("Error in the XML event driven parser (line = %d, column = %d): %s",
err->line, err->int2 , err->message);
} else {
Rf_warning("Error in the XML event driven parser (line = %d, column = %d): %s",
err->line, err->int2 , err->message);
}
}
void
RS_XML(warningHandler)(void *ctx, const char *msg, ...)
{
Rf_warning("XML event driven parser warning from %s.",
((RS_XMLParserData*) ctx)->fileName);
}
SEXP
RS_XML_xmlStopParser(SEXP r_context)
{
xmlParserCtxtPtr context;
if(TYPEOF(r_context) != EXTPTRSXP || R_ExternalPtrTag(r_context) != Rf_install(XML_PARSER_CONTEXT_TYPE_NAME)) {
Rf_error("xmlStopParser requires an " XML_PARSER_CONTEXT_TYPE_NAME " object");
}
context = (xmlParserCtxtPtr) R_ExternalPtrAddr(r_context);
if(!context) {
Rf_error("NULL value passed to RS_XML_xmlStopParser. Is it a value from a previous session?");
}
xmlStopParser(context);
return(ScalarLogical(1));
}
XML/src/libxmlFeatures.c 0000644 0001751 0000144 00000004432 14327573457 014675 0 ustar hornik users #include "RS_XML.h"
#include "Utils.h"
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
SEXP
R_getXMLFeatures(void)
{
#ifdef HAVE_XML_HAS_FEATURE
int features[] = {
XML_WITH_THREAD,
XML_WITH_TREE,
XML_WITH_OUTPUT,
XML_WITH_PUSH,
XML_WITH_READER,
XML_WITH_PATTERN,
XML_WITH_WRITER,
XML_WITH_SAX1,
XML_WITH_FTP,
XML_WITH_HTTP,
XML_WITH_VALID,
XML_WITH_HTML,
XML_WITH_LEGACY,
XML_WITH_C14N,
XML_WITH_CATALOG,
XML_WITH_XPATH,
XML_WITH_XPTR,
XML_WITH_XINCLUDE,
XML_WITH_ICONV,
XML_WITH_ISO8859X,
XML_WITH_UNICODE,
XML_WITH_REGEXP,
XML_WITH_AUTOMATA,
XML_WITH_EXPR,
XML_WITH_SCHEMAS,
XML_WITH_SCHEMATRON,
XML_WITH_MODULES,
XML_WITH_DEBUG,
XML_WITH_DEBUG_MEM,
XML_WITH_DEBUG_RUN,
#ifdef HAVE_XML_WITH_ZLIB
XML_WITH_ZLIB
#else
-1
#endif
};
const char * const names[] = {
"THREAD",
"TREE",
"OUTPUT",
"PUSH",
"READER",
"PATTERN",
"WRITER",
"SAX1",
"FTP",
"HTTP",
"VALID",
"HTML",
"LEGACY",
"C14N",
"CATALOG",
"XPATH",
"XPTR",
"XINCLUDE",
"ICONV",
"ISO8859X",
"UNICODE",
"REGEXP",
"AUTOMATA",
"EXPR",
"SCHEMAS",
"SCHEMATRON",
"MODULES",
"DEBUG",
"DEBUG_MEM",
"DEBUG_RUN",
"ZLIB"
};
SEXP ans, rnames;
int n = sizeof(features)/sizeof(features[0]), i;
PROTECT(ans = allocVector(LGLSXP, n));
PROTECT(rnames = allocVector(STRSXP, n));
for(i = 0; i < n; i++) {
if(features[i] > -1)
LOGICAL(ans)[i] = xmlHasFeature(features[i]);
else
LOGICAL(ans)[i] = NA_LOGICAL;
SET_STRING_ELT(rnames, i, mkChar(names[i]));
}
SET_NAMES(ans, rnames);
UNPROTECT(2);
return(ans);
#else
return(allocVector(STRSXP, 0));
#endif
}
XML/src/RSCommon.h 0000644 0001751 0000144 00000001240 13607633744 013377 0 ustar hornik users
/*
Copyright the Omegahat project 1999-2005.
Distributed under the GPL license (version 2).
*/
/*
Cut-dpwn version for XML as an R package
*/
#ifndef RSCOMMON_H
#define RSCOMMON_H
#ifdef __cplusplus
extern "C" {
#endif
#include
#include
#ifdef length
#undef length
#endif
#ifdef GET_LENGTH
#undef GET_LENGTH
#define GET_LENGTH(x) Rf_length(x)
#endif
#ifdef append
#undef append
#endif
typedef SEXP USER_OBJECT_;
typedef int RSInt;
#include "R_ext/Boolean.h"
#define CHAR_DEREF(x) CHAR((x))
#define IS_FUNCTION(x) isFunction((x))
#ifdef __cplusplus
}
#endif
#endif /* end of RSCOMMON_H*/
XML/src/EventParse.c 0000644 0001751 0000144 00000035136 14106741723 013754 0 ustar hornik users
/*
File that provides the entry point for an event driven XML parser
that performs callbacks to the different user-level functions in
the closure passed to it.
* See Copyright for the license status of this software.
*/
#include "EventParse.h"
#define R_USE_XML_ENCODING 1
#include "Utils.h" /* For the findFunction and invokeFunction. */
#undef R_USE_XML_ENCODING /*XXX */
#include "RSCommon.h"
extern void R_PreserveObject(SEXP);
extern void R_ReleaseObject(SEXP);
/*
Read the specified file as an XML document and invoke functions/methods in
the handlers closure object when each node in the tree is encountered by
the parser. These events are startElement,endElement, character data, etc.
The remaining arguments control how the calls to the user level functions
are made. The first (addContext) indicates whether information about the position
in the tree (an integer index path)
*/
typedef Rboolean Sboolean;
Sboolean
IsConnection(USER_OBJECT_ obj)
{
int i;
USER_OBJECT_ k = GET_CLASS(obj);
if(GET_LENGTH(k) == 0)
return(FALSE);
for(i = 0; i < GET_LENGTH(k); i++) {
if(strcmp("connection", CHAR_DEREF(STRING_ELT(k, i))) == 0)
return(TRUE);
}
return(FALSE);
}
static USER_OBJECT_
RS_XML(createAttributesList)(const char **atts, const xmlChar *encoding)
{
int n=0, i;
const char **ptr = atts;
USER_OBJECT_ attr_names;
USER_OBJECT_ attr_values;
while(ptr && ptr[0]) {
n++;
ptr += 2;
}
if(n < 1)
return(NULL_USER_OBJECT);
PROTECT(attr_values = NEW_CHARACTER(n));
PROTECT(attr_names = NEW_CHARACTER(n));
ptr = atts;
for(i=0; i < n; i++, ptr+=2) {
SET_STRING_ELT(attr_values, i, ENC_COPY_TO_USER_STRING(ptr[1]));
SET_STRING_ELT(attr_names, i, ENC_COPY_TO_USER_STRING(ptr[0]));
}
SET_NAMES(attr_values, attr_names);
UNPROTECT(2);
return(attr_values);
}
USER_OBJECT_
RS_XML(Parse)(USER_OBJECT_ fileName, USER_OBJECT_ handlers, USER_OBJECT_ endElementHandlers,
USER_OBJECT_ addContext,
USER_OBJECT_ ignoreBlanks, USER_OBJECT_ useTagName, USER_OBJECT_ asText,
USER_OBJECT_ trim, USER_OBJECT_ useExpat, USER_OBJECT_ stateObject,
USER_OBJECT_ replaceEntities, USER_OBJECT_ validate, USER_OBJECT_ saxVersion,
USER_OBJECT_ branches, USER_OBJECT_ useDotNames, USER_OBJECT_ errorFun,
USER_OBJECT_ manageMemory, USER_OBJECT_ r_encoding)
{
#ifdef LIBEXPAT
FILE *file = NULL;
int expat = 0;
#endif
char *name, *input;
RS_XML_ContentSourceType asTextBuffer;
RS_XMLParserData *parserData;
USER_OBJECT_ ans;
int status;
if(IsConnection(fileName) || isFunction(fileName))
asTextBuffer = RS_XML_CONNECTION;
else
asTextBuffer = LOGICAL_DATA(asText)[0] ? RS_XML_TEXT : RS_XML_FILENAME;
#ifdef LIBEXPAT
expat = LOGICAL_DATA(useExpat)[0];
if(expat && asTextBuffer == 0) {
#ifdef USE_R
name = R_ExpandFileName(CHAR(STRING(fileName)[0]));
#else
name = CHARACTER_DATA(fileName)[0];
#endif
file = fopen(name,"r");
if(file == NULL) {
Rf_error("Can't find file %s", name);
}
} else
#endif /* ifdef LIBEXPAT */
if(asTextBuffer == RS_XML_CONNECTION) {
name = strdup("");
input = (char *)fileName;/*XXX*/
} else {
name = strdup(CHAR_DEREF(STRING_ELT(fileName, 0)));
input = name;
}
parserData = RS_XML(createParserData)(handlers, manageMemory);
parserData->endElementHandlers = endElementHandlers;
parserData->branches = branches;
parserData->fileName = name;
parserData->callByTagName = LOGICAL_DATA(useTagName)[0];
parserData->addContextInfo = LOGICAL_DATA(addContext)[0];
parserData->trim = LOGICAL_DATA(trim)[0];
parserData->ignoreBlankLines = LOGICAL_DATA(ignoreBlanks)[0];
parserData->stateObject = (stateObject == NULL_USER_OBJECT ? NULL : stateObject);
parserData->useDotNames = LOGICAL_DATA(useDotNames)[0];
parserData->dynamicBranchFunction = NULL;
/*Is this necessary? Shouldn't it already be protected? Or is there a chance that we may
be doing this asynchronously in a pull approach. */
if(parserData->stateObject && parserData->stateObject != NULL_USER_OBJECT)
R_PreserveObject(parserData->stateObject);
#ifdef LIBEXPAT
if(expat) {
if(asTextBuffer == 0) {
RS_XML(parseWithParserData)(file, parserData);
} else {
parserData->fileName = "";
RS_XML(parseBufferWithParserData)(name, parserData);
free(name); /* match the strdup() above */
}
} else
#endif /* ifdef LIBEXPAT */
#if 0
/* If one wants entities expanded directly and to appear as text. */
xmlSubstituteEntitiesDefault(LOGICAL_DATA(replaceEntities)[0]);
#endif
status = RS_XML(libXMLEventParse)(input, parserData, asTextBuffer, INTEGER_DATA(saxVersion)[0], r_encoding);
/* How about using R_alloc() here so that it is freed, i.e. for the fileName and the parserData itself. */
ans = parserData->stateObject ? parserData->stateObject : handlers;
free(parserData->fileName);
if(parserData->stateObject && parserData->stateObject != NULL_USER_OBJECT)
R_ReleaseObject(parserData->stateObject);
if(status != 0)
RSXML_structuredStop(errorFun, NULL);
/* free(parserData); Now using R_alloc */
return(ans);
}
/**
Handler that receives declarations of unparsed entities. These are entity declarations that have a notation (NDATA) field:
*/
void
RS_XML(entityDeclarationHandler)(void *userData, const XML_Char *entityName,
const XML_Char *base, const XML_Char *systemId,
const XML_Char *publicId, const XML_Char *notationName)
{
RS_XMLParserData *parserData = (RS_XMLParserData*)userData;
USER_OBJECT_ opArgs;
int i, num;
const XML_Char *xml_args[5];
DECL_ENCODING_FROM_EVENT_PARSER(parserData)
num = sizeof(xml_args)/sizeof(xml_args[0]);
xml_args[0] = entityName; xml_args[1] = base;
xml_args[2] = systemId; xml_args[3] = publicId;
xml_args[4] = notationName;
opArgs = PROTECT(NEW_LIST(num));
for(i =0;i < num; i++) {
SET_VECTOR_ELT(opArgs, i, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, i), 0, ENC_COPY_TO_USER_STRING(xml_args[i] ? (const xmlChar *)xml_args[i] : (const xmlChar *)""));
}
RS_XML(callUserFunction)(HANDLER_FUN_NAME(parserData, "entityDeclaration"),
(const char*)NULL, parserData, opArgs);
UNPROTECT(1);
}
void
RS_XML(startElement)(void *userData, const char *name, const char **atts)
{
USER_OBJECT_ opArgs;
int i;
RS_XMLParserData *rinfo = (RS_XMLParserData*) userData;
DECL_ENCODING_FROM_EVENT_PARSER(rinfo)
if((i = R_isBranch(CHAR_TO_XMLCHAR(name), rinfo)) != -1) {
R_processBranch(rinfo, i, CHAR_TO_XMLCHAR(name), NULL, NULL, 0, NULL, 0, 0, (const xmlChar ** /*XXX*/) atts, 1);
return;
}
PROTECT(opArgs = NEW_LIST(2));
SET_VECTOR_ELT(opArgs, 0, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 0), 0, ENC_COPY_TO_USER_STRING(name));
/* Now convert the attributes list. */
SET_VECTOR_ELT(opArgs, 1, RS_XML(createAttributesList)(atts, encoding));
RS_XML(callUserFunction)(HANDLER_FUN_NAME(rinfo, "startElement"), name, ((RS_XMLParserData*) userData), opArgs);
UNPROTECT(1);
}
void
RS_XML(commentHandler)(void *userData, const XML_Char *data)
{
USER_OBJECT_ opArgs = NEW_LIST(1);
RS_XMLParserData *rinfo = (RS_XMLParserData *) userData;
DECL_ENCODING_FROM_EVENT_PARSER(rinfo)
PROTECT(opArgs);
SET_VECTOR_ELT(opArgs, 0, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 0), 0, ENC_COPY_TO_USER_STRING(data));
RS_XML(callUserFunction)(HANDLER_FUN_NAME(rinfo, "comment"),
(const char *)NULL, ((RS_XMLParserData*)userData), opArgs);
UNPROTECT(1);
}
USER_OBJECT_
findEndElementFun(const char *name, RS_XMLParserData *rinfo)
{
int i, n;
USER_OBJECT_ names = GET_NAMES(rinfo->endElementHandlers);
n = GET_LENGTH(rinfo->endElementHandlers);
for(i = 0 ; i < n ; i++) {
if(strcmp(CHAR_DEREF(STRING_ELT(names, i)), name) == 0)
return(VECTOR_ELT(rinfo->endElementHandlers, i));
}
return(NULL);
}
void RS_XML(endElement)(void *userData, const char *name)
{
USER_OBJECT_ opArgs, fun;
RS_XMLParserData *rinfo = (RS_XMLParserData *) userData;
DECL_ENCODING_FROM_EVENT_PARSER(rinfo)
if(rinfo->current) {
/* Dealing with a branch, so close up. */
R_endBranch(rinfo, CHAR_TO_XMLCHAR(name), NULL, NULL);
return;
}
((RS_XMLParserData*)userData)->depth++; /* ??? should this be depth-- */
PROTECT(opArgs = NEW_LIST(1));
SET_VECTOR_ELT(opArgs, 0, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 0), 0, ENC_COPY_TO_USER_STRING(name));
fun = findEndElementFun(name, rinfo);
if(fun) {
USER_OBJECT_ val = PROTECT(RS_XML(invokeFunction)(fun, opArgs, rinfo->stateObject, rinfo->ctx));
updateState(val, rinfo);
UNPROTECT(1);
}
else
RS_XML(callUserFunction)(HANDLER_FUN_NAME(rinfo, "endElement"), NULL, ((RS_XMLParserData*) userData), opArgs);
UNPROTECT(1);
}
/**
Called for inline expressions of the form
such as
*/
void
RS_XML(processingInstructionHandler)(void *userData, const XML_Char *target, const XML_Char *data)
{
USER_OBJECT_ opArgs;
RS_XMLParserData *parserData = (RS_XMLParserData *) userData;
DECL_ENCODING_FROM_EVENT_PARSER(parserData)
PROTECT(opArgs = NEW_LIST(2));
SET_VECTOR_ELT(opArgs, 0, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 0), 0, ENC_COPY_TO_USER_STRING(target));
SET_VECTOR_ELT(opArgs, 1, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 1), 0, ENC_COPY_TO_USER_STRING(data));
RS_XML(callUserFunction)(HANDLER_FUN_NAME(parserData, "processingInstruction"),
(const char *)NULL, (RS_XMLParserData*)userData, opArgs);
UNPROTECT(1);
}
void
RS_XML(startCdataSectionHandler)(void *userData)
{
}
void
RS_XML(endCdataSectionHandler)(void *userData)
{
}
char *
fixedTrim(char *str, int len, int *start, int *end)
{
char *tmp;
*end = len;
*start = 0;
/* If a degenerate string, just return. */
if(len == 0 || str == (char*)NULL || str[0] == '\0')
return(str);
/* Jump to the end */
tmp = str + len - 2; // DTL has 1
while(tmp >= str && isspace(*tmp)) {
tmp--;
(*end)--;
}
if(tmp == str) {
return(str);
}
tmp = str;
while(*start <= *end && *tmp && isspace(*tmp)) {
tmp++;
(*start)++;
}
return(tmp);
}
void
RS_XML(textHandler)(void *userData, const XML_Char *s, int len)
{
char *tmpString, *tmp;
USER_OBJECT_ opArgs = NULL;
RS_XMLParserData *parserData = (RS_XMLParserData*)userData;
DECL_ENCODING_FROM_EVENT_PARSER(parserData)
int nprot = 0;
/* XXX Here is where we have to ignoreBlankLines and use the trim setting in parserData */
if(parserData->current) {
xmlChar *tmp;
int newLen = len, start = 0, end = len;
#if 1
if(parserData->trim) {
tmpString = fixedTrim(XMLCHAR_TO_CHAR(s), len, &start, &end);
newLen = end - start;
} else
tmpString = XMLCHAR_TO_CHAR(s);
if(newLen < 0 && parserData->ignoreBlankLines)
return;
#else
tmpString = s;
#endif
if(newLen < 0)
tmp = (xmlChar *)strdup("");
else {
tmp = (xmlChar *) S_alloc(newLen + 2, sizeof(xmlChar));
memcpy(tmp, tmpString, newLen); tmp[newLen] = '\0';
}
xmlAddChild(parserData->current, xmlNewText(tmp));
//XXX???
if(newLen < 0) free(tmp);
return;
}
/* Last case handles ignoring the new line between the two nodes if trim is TRUE.
*/
if(s == (XML_Char*)NULL || s[0] == (XML_Char)0 || len == 0
|| (len == 1 && ((const char *) s)[0] == '\n' && parserData->trim))
return;
/*XXX Deal with encoding, memory cleanup,
1 more than length so we can put a \0 on the end. */
tmp = tmpString = (char*)calloc(len+1, sizeof(char));
strncpy(tmpString, s, len);
if(parserData->trim) {
tmpString = trim(tmpString);
len = (int) strlen(tmpString);
}
if(len > 0 || parserData->ignoreBlankLines == 0 ) {
PROTECT(opArgs = NEW_LIST(1)); nprot++;
SET_VECTOR_ELT(opArgs, 0, NEW_CHARACTER(1));
SET_STRING_ELT(VECTOR_ELT(opArgs, 0), 0, ENC_COPY_TO_USER_STRING(tmpString));
}
free(tmp);
/* If we are ignoring blanks and the potentially newly computed length is non-zero, then
call the user function.
*/
if(opArgs != NULL) {
RS_XML(callUserFunction)(HANDLER_FUN_NAME(parserData, "text"), (const char *)NULL, ((RS_XMLParserData*) userData), opArgs);
}
UNPROTECT(nprot);
}
int
RS_XML(notStandAloneHandler)(void *userData)
{
/* printf("In NotStandalone handler\n"); */
return(1);
}
/**
Create the parser data which contains the
the collection of functions to call for each
event type.
This allocates the parser memory using calloc.
The caller should arrange to free it.
*/
RS_XMLParserData *
RS_XML(createParserData)(USER_OBJECT_ handlers, USER_OBJECT_ finalize)
{
RS_XMLParserData *parser = (RS_XMLParserData *) R_alloc(1, sizeof(RS_XMLParserData));
memset(parser, '\0', sizeof(RS_XMLParserData));
parser->methods = handlers;
parser->finalize = finalize;
return(parser);
}
/**
Routine that locates and invokes the R function in the collection of handlers.
opName is the identifier for the generic operation, i.e. startElement, text, etc.
perferredName is the identifier for the node.
*/
USER_OBJECT_
RS_XML(callUserFunction)(const char *opName, const char *preferredName, RS_XMLParserData *parserData, USER_OBJECT_ opArgs)
{
USER_OBJECT_ fun = NULL, val;
USER_OBJECT_ _userObject = parserData->methods;
// int general = 0;
R_CHECK_INTERRUPTS
if(preferredName && parserData->callByTagName) {
fun = RS_XML(findFunction)(preferredName, _userObject);
}
if(fun == NULL) {
// general = 1;
fun = RS_XML(findFunction)(opName, _userObject);
}
if(fun == NULL || isFunction(fun) == 0 ) {
/* || (general && R_isInstanceOf(fun, "AsIs"))) Should we do this? */
/* FAILED */
return(NULL_USER_OBJECT);
}
val = PROTECT(RS_XML(invokeFunction)(fun, opArgs, parserData->stateObject, parserData->ctx));
updateState(val, parserData);
UNPROTECT(1);
return(val);
}
void
updateState(USER_OBJECT_ val, RS_XMLParserData *parserData)
{
if(!parserData->stateObject || parserData->stateObject == NULL_USER_OBJECT) {
return;
}
#ifdef _R_
R_ReleaseObject(parserData->stateObject);
R_PreserveObject(val);
#else
decr_ref_count(parserData->stateObject, TRUE, Local_data, S_evaluator);
incr_ref_count(val, TRUE, Local_data, S_evaluator);
#endif
parserData->stateObject = val;
}
XML/src/DocParse.h 0000644 0001751 0000144 00000003120 13607633744 013401 0 ustar hornik users /*
* See Copyright for the license status of this software.
*/
#ifndef XMLPARSE_H
#define XMLPARSE_H
#include
#include
#include "RSCommon.h"
#include "RS_XML.h"
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
typedef struct {
int skipBlankLines;
int trim;
int xinclude;
USER_OBJECT_ converters;
int addAttributeNamespaces;
int internalNodeReferences;
int fullNamespaceInfo;
int useDotNames;
SEXP finalize;
} R_XMLSettings;
enum {DOWN, SIDEWAYS};
USER_OBJECT_ RS_XML(convertXMLDoc)(const char *fileName, xmlDocPtr doc, USER_OBJECT_ converterFunctions, R_XMLSettings *settings);
/*USER_OBJECT_ RS_XML(createXMLNode)(xmlNodePtr node, int recursive, int direction, R_XMLSettings *settings, USER_OBJECT_ parentUserNode);*/
USER_OBJECT_ RS_XML(AttributeList)(xmlNodePtr node, R_XMLSettings *settings);
USER_OBJECT_ RS_XML(createNodeChildren)(xmlNodePtr node, int direction, R_XMLSettings *parserSettings);
USER_OBJECT_ RS_XML(lookupGenericNodeConverter)(xmlNodePtr node, USER_OBJECT_ methods, R_XMLSettings *parserSettings);
USER_OBJECT_ RS_XML(createNameSpaceIdentifier)(xmlNs *space, xmlNodePtr node);
USER_OBJECT_ RS_XML_xmlXIncludeProcessFlags(USER_OBJECT_ r_doc, USER_OBJECT_ r_flags);
USER_OBJECT_ processNamespaceDefinitions(xmlNs *ns, xmlNodePtr node, R_XMLSettings *parserSettings);
typedef struct _R_NodeGCInfo {
struct _R_NodeGCInfo *prev;
struct _R_NodeGCInfo *next;
xmlNodePtr node;
int count;
} R_NodeGCInfo;
void initDocRefCounter(xmlDocPtr doc);
#endif
XML/src/fixNS.c 0000644 0001751 0000144 00000005005 13607633771 012727 0 ustar hornik users #include
#include "RSCommon.h"
#include "RS_XML.h"
#define R_USE_XML_ENCODING 1
#include "Utils.h" /* R_createXMLNodeRef, Encoding macros. */
// need to release any namespace.
int fixDummyNS(xmlNodePtr node, int recursive);
int setDummyNS(xmlNodePtr node, const xmlChar *prefix);
xmlNs *findNSByPrefix(xmlNodePtr node, const xmlChar *prefix);
SEXP
R_fixDummyNS(SEXP r_node, SEXP r_recursive)
{
xmlNodePtr node;
int status;
node = (xmlNodePtr) R_ExternalPtrAddr(r_node);
status = fixDummyNS(node, LOGICAL(r_recursive)[0]);
return(ScalarInteger(status));
}
int
fixDummyNS(xmlNodePtr node, int recursive)
{
xmlNs *ns = node->ns;
int count = 0;
if(ns && strcmp((const char *)ns->href, "") == 0)
count = setDummyNS(node, ns->prefix);
if(recursive) {
xmlNodePtr ptr = node->children;
while(ptr) {
count += fixDummyNS(ptr, recursive);
ptr = ptr->next;
}
}
return(count);
}
int
setDummyNS(xmlNodePtr node, const xmlChar *prefix)
{
xmlNodePtr a = node->parent;
while(a) {
xmlNs *ns;
ns = findNSByPrefix(a, prefix);
if(ns) {
#ifdef R_XML_DEBUG
fprintf(stderr, "mapping %s to %s\n", prefix, ns->href);fflush(stderr);
#endif
node->nsDef = node->nsDef->next;
xmlSetNs(node, ns);
return(1);
}
a = a->parent;
}
return(0);
}
xmlNs *
findNSByPrefix(xmlNodePtr node, const xmlChar *prefix)
{
xmlNs *ptr = node->nsDef;
while(ptr) {
if((!prefix || !prefix[0]) && !ptr->prefix)
return(ptr);
if(prefix && ptr->prefix && strcmp((const char *)ptr->prefix, (const char *)prefix) == 0)
return(ptr);
ptr = ptr->next;
}
return(NULL);
}
void
setDefaultNs(xmlNodePtr node, xmlNsPtr ns, int recursive)
{
if(!node->ns)
xmlSetNs(node, ns);
if(recursive) {
xmlNodePtr cur = node->children;
while(cur) {
setDefaultNs(cur, ns, 1);
cur = cur->next;
}
}
}
SEXP
R_getAncestorDefaultNSDef(SEXP r_node, SEXP r_recursive)
{
xmlNodePtr cur, node;
xmlNs *ans = NULL;
cur = (xmlNodePtr) R_ExternalPtrAddr(r_node);
node = cur->parent;
while(node && (node->type != XML_DOCUMENT_NODE &&
node->type != XML_HTML_DOCUMENT_NODE)) { /* Need to check for HTML_DOC or XML_DOC ?*/
ans = findNSByPrefix(node, NULL);
if(ans)
break;
node = node->parent;
}
if(ans) {
xmlSetNs(cur, ans);
if(LOGICAL(r_recursive)[0]) {
setDefaultNs(cur, ans, 1);
}
return(ScalarLogical(1)); // R_createXMLNsRef(ans));
}
return(R_NilValue);
}
XML/src/NodeGC.h 0000644 0001751 0000144 00000002010 13607633744 012775 0 ustar hornik users #ifndef NODEGC_H
#define NODEGC_H
#if 1
/*
We use the address of a global variable as a marker/signature that
indicates we created the value of _private.
*/
extern int R_XML_MemoryMgrMarker;
extern int R_XML_NoMemoryMgmt;
#define R_MEMORY_MANAGER_MARKER R_XML_MemoryMgrMarker
#define IS_NOT_OUR_DOC_TO_TOUCH(doc) (doc->_private == NULL || (doc->_private && doc->_private == &R_XML_NoMemoryMgmt) || ((int*)doc->_private)[1] != R_MEMORY_MANAGER_MARKER)
#define IS_NOT_OUR_NODE_TO_TOUCH(node) ((node->_private == NULL) || (node->doc && node->doc->_private && node->doc->_private == &R_XML_NoMemoryMgmt) || ((int*)node->_private)[1] != R_MEMORY_MANAGER_MARKER)
#else
/* Not used. */
#define IS_NOT_OUR_DOC_TO_TOUCH(doc) (doc && doc->name && strcmp((doc)->name, " fake node libxslt") == 0)
#define IS_NOT_OUR_NODE_TO_TOUCH(node) (node && (node)->doc && IS_NOT_OUR_DOC_TO_TOUCH((node)->doc))
#endif
void decrementNodeRefCount(SEXP rnode);
void initDocRefCounter(xmlDocPtr doc);
void decrementNodeRefCount(SEXP rnode);
#endif
XML/ChangeLog 0000644 0001751 0000144 00000160763 14636531006 012523 0 ustar hornik users ============ entries from CRAN ============
Version 3.99-0.17
Changes for safe use of R_ExternalPtrAddr() in src/XMLTree.c.
Version 3.99-0.16.1
Changes for libxml2 >= 2.11.0, in src/DocParse.c and src/XMLEventParse.c
Version 3.99-0.16
Avoid prntf-like warnings
Rd markup
Version 3.99-0.15
Complete stub in LICENSE file.
Version 3.99-0.14
remove unexported generic append()
update URLs
Version 3.99-0.13
use snprintf instead of sprintf
Version 3.99-0.12
version for libxml2 2.10.x
update URLs
tweaks for -Wstrict-prototypes
Version 3.99-0.11
workaround for a LaTeX message that causes R CMD check to
interpret it as an error
reduce LaTeX warnings
Version 3.99-0.10
Rd markup
Version 3.99-0.9
replace default.stringsAsFactors() by FALSE
Version 3.99-0.8
run autoupdate
Version 3.99-0.7
use Rf_{error,warning} rather than S legacy macros
Version 3.99-0.6
Add src/Makevars.ucrt
Version 3.99-0.5
Update src/Makevars.win
Add missing PROTECT() wrappers.
Version 3.99-0.4
replace --slave by --no-echo
Version: 3.99-0.3
follow DTL with BSD_3_clause
tweak for Windows
version 3.99-0.2 (2020-01-18)
CRAN (not DTL) as maintainer.
version 3.99-0.1
First 3.99 version in R svn, bug fixes.
============ entries from DTL ============
Version 3.99-0
* We can specify R functions and C routines for use as XPath
functions in calls to getNodeSet() and xpathApply().
* Implementations of XPath 2.0 functions matches(), lower-case(),
ends-with(), abs(), min(), max(), replace()
Version 3.98-2
* xmlSave() of a document to a file with encoding now honors indenting.
Uses xmlSaveFormatFileEnc(). Issue identified by Earl Brown.
Version 3.98-1
* xmlToS4() handles attributes with namespace prefixes and children
with the same node name.
* Compilation error with clang. Simple declaration of a routine.
* xmlXIncludes() added.
* Changes to simplifyPath().
Version 3.98-0
* Update for libxml2-2.9.1 and reading from a connection for xmlEventParse().
* xmlIncludes() is a hierarchical version of getXIncludes()
* Modifications to xmlSource(), e.g. verbose = TRUE as default.
Version 3.97-0
* Fix for xmlValue(node) = text. Identified by Lawrence Edwards.
Uses xmlNodeSetContent() now and leaves freeing the original content to that routine.
* Updates for xmlSource()
Version 3.96-1
* readHTMLTable() ignores headers that are over 999 characters.
* Fix a problem in readHTMLTable() with some table headers not having
the correct number of elements to match the columns.
Version 3.96-0
* Introduced readHTMLList(), getHTMLLinks(), getHTMLExternalFiles(), getXIncludes().
* When serializing XMLNode objects, i.e. R representations of nodes, ensure " and <, etc. in attributes
are serialized correctly.
Version 3.95-1
* Allow htmlParse(), xmlParse(), etc. ?
Version 3.95-0
* Moved development version of the source code for the package to github -
https://github.com/omegahat/XML.git
* Changes to the structure of the package to allow installation directly rather than
via a one-step staging into the R package structure.
* Sample XML documents moved from data/ to exampleData, and examples updated.
* getDefaultNamespace() and matchNamespaces() use simplify = TRUE to call
xmlNamespaceDefinitions() to get the namespaces as a character vector rather than
list.
* Documentation updates
Version 3.94-0
* getNodeLocation() now reports the actual line number for text nodes rather than 0,
using the sibling nodes' or parent node's line number.
* xpathApply() and related functions work with builtin type "functions",
e.g. class.
* xpathApply() and related functions (getNodeSet, xpathSApply) allow
the caller to specify multiple queries as a character vector
and these are pasted together as compound location paths by
separating them with a '|'. This makes it easier for the
caller to manage the different queries.
* assigning to a child of a node works, e.g. node[["abc"]] = text/node
and node[[index]] = text/node. We replace a matching name. If the
replacement value is text, we use the name to
* getChildrenStrings() is a function that implements the equivalent of
xmlApply(node, xmlValue) but faster because we avoid the function call
for each element.
* options parameter for xmlParse() and htmlParse() for controlling the parser.
(Currently only used when encoding is explicitly specified.)
* encoding parameter for xmlParse() and xmlTreeParse() now works for XML documents,
not just HTML documents.
* Update for readHTMLTable() method so that we look at just the final node
in a .
Version 3.93-1
* Fixed bug in findXInclude() that sometimes got the wrong XMLXIncludeStartNode.
Hence getNodeLocation() might report the wrong file, but correct line number!
* findXInclude() now has a recursive parameter that resolves the chain of XIncludes.
This returns the full path to the file, relative to the base/top-level document,
not just the parent document.
* Change to the default value of the error parameter in htmlParse() and htmlTreeParse()
which will generate a structured R error if there is an IO error.
The set of issues that will raise an error will be broadened in the future.
Version 3.93-0
* Enabled the fixing of namespaces by finding the definition o
for that prefix in the ancestor nodes.
Version 3.92-2
* Synchronized compilation flags for Windows with those on OSX & Linux.
Version 3.92-1
* Restore original error handler function for htmlParse() and htmlTreeParse()
* Fixed a reference counting problem caused by not adding a finalizer in the
as() method for coercing an XMLInternalNode to an XMLInternalDocument.
Example from Janko Thyson.
* Fixed up some partial argument names found by R CMD check!
Version 3.92-0
* Added --enable-xml-debug option for the configure script and this activates
the debugging diagnostic reporting, mainly for the garbage collection and node
reference counts.
* Work-around for HTML documents not being freed (but XML documents are!)
* Added an isHTML parameter for xmlTreeParse.
* Merge htmlTreeParse/htmlParse with xmlTreeParse.
* Implemented some diagnostic facilities to determine if an external pointer
is in R's weak references list. This needs support within R. (Ask for code if
you want.)
Version 3.91-0
* Start of implementation to allow nested calls to newXMLNode() to use namespace prefixes
defined in ancestor nodes. Disabled at present.
Version 3.9-4
* readHTMLTable() passes the encoding to the cell function.
* xmlValue() and saveXML() use the encoding from the document, improving conversion of strings.
* More methods for getEncoding()
Version 3.9-3
* getEncoding() returns NA when the encoding is not known. Previously, this might seg-fault!
* readHTMLTable() passes an encoding argument to the call to xmlValue (and the value of elFun).
Version 3.9-2
* Static NAMESPACE (rather than generated via configure)
* Default for directory in Makevars.win to search for header files and libraries needed
for compilation.
Version 3.9-1
* Added method for removeNodes for XMLNodeList.
Version 3.9-0
* Enabled additional encoding for element, attribute and namespace names, and
in xmlValue().
* Corrected default value in documentation for parse in xmlSource().
Version 3.8-1
* Corrected documentation for readHTMLTable() about stringsAsFactors behaviour.
* Added parse = FALSE as parameter for xmlSource() to allow just returning the text from
each node.
Version 3.8-0
* added readSolrDoc() and readKeyValueDB() functions to read Solr and Property list documents.
Version 3.7-4
* saveXML() for XMLNode returns a character vector of length 1, i.e. a single string.
Version 3.7-3
* Allow xmlTreeParse() and xmlParse() to process content starting with a BOM.
This works when the name of a file/URL is provided, but didn't when the content
was provided directly as a string. Identified by Milan Bouchet-Valat.
* error message when XML content is not XML or a file name now puts the content at the end
for improved readability.
Version 3.7-2
* Import methods package explicitly.
Version 3.7-1
* Added an alias for the coerce method for Currency.
* Added a C routine to query if reference counting is enabled.
See tests/checkRefCounts.R.
Version 3.7-0
* Added Currency as an option for colClass in readHTMLTable to
convert strings of the form $xxx,yyy,zzz, i.e. comma-separated
and preceeded by a $. (No other currency supported yet.)
* Fix for newXMLNode() that caused a seg fault if a node was specified
as the document. Thanks to Jeff Allen.
Version 3.6-2
* Changed URL in readHTMLTable() example to new page for population of
countries
* Changes to Rprintf() rather than stderr. Still some code that uses stderr
intentionally.
Version 3.6-1
* Fix bug which caused XMLInternalUnknownNode in xmlParent() for HTML documents.
* General improvements to support nodes of type XML_HTML_DOCUMENT_NODE.
* removeNodes() method for XMLNodeSet.
Version 3.6-0
* xmlParent() is an S4 generic with methods.
* xmlAncestors() has a count argument to limit the number of ancestors
returned.
* removeNodes() is generic.
* addChildren() now removes "internal" nodes from their current parent, if any.
Avoids memory corruption in XML tree.
* ADD_XMLOUTPUT_BUFFER R variable for Windows.
* Defined XMLTreeNode as an old-style class.
Version 3.5-1
* Additional workaround for libxml2 2.6.16 for printing HTML document.
* noMatchOk parameter for xpathApply.XMLInternalNode to suppress warnings about
finding no nodes when there is a namespace in the query.
* xmlNamespace<-() function and methods to allow one to set the namespace
on a node, e.g., by the namespace prefix.
* readHTMLTable() allows "factor" as an entry in colClasses.
Version 3.5-0
* Addeds nsDef parameter for parseXMLAndAdd().
* Minor addition to readHTMLTable() methods to handle malformed HTML
with all the tr nodes in the thead.
Version 3.4-3
* Set default of append parameter in xmlChildren<-() method for non-internal nodes
to FALSE so that we replace the existing nodes.
Version 3.4-2
Version 3.4-1
* Type in C code for method for xmlClone().
* Minor fixes for formatting of 2 help/Rd files.
* Removed definition of XPathNodeSet which is never used here but redefined in Sxslt.
* Fix when adding a default namespace to a node in an HTML document.
* Fix when adding a default namespace to a node in an HTML document.
Version 3.4-0
* Added xmlSearchNs() to aid looking for XML definitions by URL or prefix.
* Support in readHTMLTable() for identifying values formatted as percents
or numbers with commas. Use the classes FormattedInteger, FormattedNumber
and Percent in colClasses.
Version 3.3-2
* Better handling of namespace definitions and uses in newXMLNode
and separation of internal code into a separate function.
Version 3.3-1
* Configuration to conditionally compile code and export functions
for removing finalizers. This relies on C routines tha will be
added to the base R distribution, so not present in any released
version of R as yet.
Version 3.3-0
* addFinalizer added as parameter to many functions and methods that
can return a reference to an internal/C-level node. This controls
whether a finalizer is added to the node and reference counting
is performed. See MemoryManagement.pdf/.html for more details.
* One can set the suppressXMLNamespaceWarning as either an XML option (via setOption())
or as a regular R option (via options(suppressXMLNamespaceWarning = ...) )
* Added methods for docName() for XMLHashTreeNode and XMLNode.
* added docName when converting from an internal tree to an XMLHashTree.
* xmlHashTree() uses an environment with no parent environment, by default.
* Added an append parameter to addChildren().
* Fixed coercion from XMLInternalNode to XMLNode.
* Made the methods (e.g. xmlAttrs<-(), xmlParent(), ...)
for XMLNode and XMLInternalNode consistent.
* Made classes agree for xmlParse() and newXMLDoc()
* fixed corner/end cases for getSibling for XMLHashTreeNode
* Added xmlRoot<- methods for XMLInternalDocument and XMLHashTree.
* Minor enhancement to xmlToDataFrame() so that one can pass
the value from getNodeSet() directly as the first argument to xmlToDataFrame()
without passing it via the nodes parameter.
* Registered all of the native routines being invoked via .Call().
Version 3.2-1
* Turn reference counting on by default again.
Version 3.2-0
* Change to reference to normalizePath() which was moved from utils to base in R-devel/R-2.13
Version 3.1-1
* Minor change in readHTMLTable method to identify table header better.
Version 3.1-0
* Method for [[ for internal element nodes that is much faster (by avoiding
creating the list of children and then indexing that R list).
Thanks to Stavros Mackracis for raising the issue.
Version 3.0-0
* This is not a major release, but an incremental numbering from 2.9-0 to 3.0-0, but with
one potentially significant change related to creating nodes. newXMLNode() now uses
the namespace of the parent node if the namespace argument is not specified.
* Refinments to improve the garbage counting and referencing counting on internal nodes.
Version 2.9-0
* xmlAttrs(, TRUE) for internal nodes returns the URL of each namespace definition
in the names of the attr(, "namespaces") vector.
* Added parseXMLAndAdd() to parse XML from a string text and
add the nodes to a parent node. This facilitates creating
a large number of quite regular nodes using string processing
techniques (e.g. sprintf(), paste())
* xmlEventParse() with branches now has garbage collecting activated.
Version 2.8-1
* Filled in missing documentation
* Added missing init = TRUE for the parameters in one of the methods for xmlSource().
Version 2.8-0
* xmlClone() puts the original S3 classes on the new object.
* Trivial fix to readHTMLTable() to get the header when the table header is inside
a tbody.
* Garbage collection/Memory management re-enabled.
Version 2.7-0
* compareXMLDocs() function
* Added xmlSourceFunctions() and xmlSourceSection()
* Support in saveXML() for XMLInternalDocument for the prefix parameter.
* saveXML() and related methods can deal with NULL pointers in
XMLInternalDocument objects.
* fixed bug in catalogAdd().
* docName() made an S4 generic with S4 methods (rather than S3 methods).
* added catalogDump()
* readHTMLTable() puts sensible names on the data frames if there is no header for the table.
Version 2.6-0
* When copying a node from one document to another, the node is explicitly
copied and not removed from the original document. This also fixes a problem
with the name space not being on the resulting node.
* New functions for converting simple, shallow XML structure to an R data frame.
xmlToDataFrame() & xmlToList()
* addChildren() can handle _copying_ a node from a different document.
* as()/coerce() method for URI to character.
* New functions to convert an XML tree to an S4 object and also to infer
S4 class definitions from XML. (makeClassTemplate(), xmlToS4())
* Minor change to C code for compilation on Solaris and Sun Studio
Version 2.5-3
* Trivial change to an Rd file to add an omitted
Version 2.5-2
* Configuration enhanced to handle very old (but standard on OS X) versions of libxml which do not have
the xmlHasFeature() routine.
People with such an old version of libxml (i.e. 2.6.16) should consider upgrading. That is 5 years old.
Version 2.5-1
* Added a configuration check and compile time condition for the presence of XML_WITH_ZLIB. This
allows installation with older versions of libxml2 such as 2.6.26.
* Moved some old S3 classes to S4 class definitions to deal with recent changes to the methods package.
Version 2.5-0
* Added xmlParseDoc() and parser option constants. These allow one to parse a document
from a file, URL or string and specify any combination of 20 different options controlling
the parser, e.g. whether to replace entities, perform XInclude, add start and end XInclude nodes,
expand entities, load external DTDs, recover when there are errors.
* Added libxmlFeatures() to dynamically determine which features were compiled into the version
of libxml2.
* newXMLNode() has a new argument sibling which is used to add the new node as the sibling of this
node. The parametr 'at' is used as the value for the 'after' parameter in addSibling().
* saveXML() is now an S4 generic. (Changes in other packages, e.g. Sxslt, RXMLHelp.)
* Added readHTMLTable() which is a reasonably robust and flexible way to read HTML tables.
* Added runTime parameter for libxmlVersion() so we can get compile and run time version information.
Version 2.4-0
* Significant change to garbage collection facilities for internal/C-level nodes.
This works hard to ensure that XMLInternalDocument objects and XMLInternalNode objects
in R remain valid even when their "parent" container is released in R. See memory.pdf.
This can be disabled with configuration argument --enable-nodegc=no.
* Configuration option to compile with xmlsec1 (or xmlsec1-openssl). More to come on support for this.
Version 2.3-0
* Added getLineNumber() to be able to determine the line number of an XML node within
its original document.
* xmlApply() and xmlSApply() have a parameter to ignore the XInclude start and end nodes.
* xmlChildren() also have an omitNodeTypes parameter and by default exclude XInclude nodes.
* Added ensureNamespace() to add a namespace definition(s) if necessary.
Version 2.2-1
* source() method equivalent to xmlSource() and appropriate installation
changes for older versions of R ( < 2.8.0).
Version 2.2-0
* Added xmlClone() and findXInclude() functions.
* [Important] Bug fix regarding the error handling function for XML and HTML parsing.
Uncovered by Roger Koenker. This manifested itself in R errors of the form
"attempt to apply non-function".
Version 1.99-1
* addChildren() unconditionally unlinks nodes that already have a parent.
* Typo bug in removeChildren.XMLNode code found and fixed by Kate Mullen.
Version 1.99-0
* Added recursive parameter to xmlValue() function to control whether to work on just the
immediate nodes or also children.
* Correction for xpathSApply() when returning an array/matrix which referred to a non-existent variable.
* Faster creation of internal nodes via newXMLNode().
* xmlRoot() for XMLHashTree works for empty trees.
* Added xmlValue<-() function.
* Fix for removeAttributes() with namespaces.
* Addition to configure script of the argument --with-xml-output-buffer to force
whether to compile and use our own "local" version of xmlOutputBufferCreateBuffer()
which is needed on unusual systems. Supplied by Jim Bullard (UC Berkeley).
Version 1.98-1
* Deal with older S3-style classes with inheritance for 2.7.2 differently from the 2.8.0
mechanism.
* Changes to catch more cases of xmlChar * being treated as char * which causes the Sun compiler to
fail to compile DocParse.c
* Export class XMLNamespaceDefinitions which caused problems in the code in the caMassClass package.
Version 1.98-0
* The function XML:::xpathSubNodeApply() is the implementation of xpathApply() for an XMLInternalNode
from earlier versions of the package and which explicitly moves the node to a new document and performs
the XPath query and then re-parents the node. Instead of using this, users can use xpathApply()/getNodeSet()
and simply change the XPath expression to be prefixed with ., e.g. instead of //tr, use .//tr to root the
XPath query at the current node.
* Minor patch to configure.in to allow for libxml2-2.7.*.
* saveXML() for XMLInternalDocument now uses xmlDocFormatDump() ratehr than xmlSaveFile()
and so formatting is "better".
* The [ and [[ operators for XMLInternalDocument support a 'namespaces' parameter
for ease of extracting nodes. This is syntactic sugar for getNodeSet()/xpathApply().
* xmlParse() and htmlParse() return internal documents and nodes by default and are easier to type.
The results are amenable to XPath queries and so these are the most flexible representations.
* xmlRoot() has a skip argument that controls whether to ignore comment and DTD nodes.
The default is TRUE.
* Additional functionality for XMLHashTree and XMLHashTreeNode, including facilities for creating nodes
while adding them to the tree, copying sub-trees/nodes to separate trees.
* Functionality to convert from an XMLInternalNode to an XMLHashTree - as(node, "XMLHashTree").
This is also an option in xmlTreeParse(, useHashTree = TRUE/FALSE)
[or xmlTreeParse(, treeType = "hashTree")]
* Branch nodes from xmlEventParse(, branches = list(...)) are now garbage collected appropriately.
* xmlAttrs.XMLInternalNode now does not add the namespace prefix to the name of the attribute,
by default. Use xmlAttrs(node, addNamespace = TRUE) to get old behaviour.
* xmlGetAttr() has a corresponding new parameter addNamespace that is passed through to the call to
xmlGetAttr().
* getRelativeURL() function available for getting URI of a document from a given attribute
relative to a base URL, e.g. an HTML or a .
* xmlAttrs<- methods support an append (TRUE by default) to add values to the existing attributes,
or to replace the existing ones with the right-hand side of the assignment.
* xmlAttrs<- checks for namespaces in all the ancestors for XMLInternalNode and XMLHashTreeNode.
* Introduced the class XMLAbstractNode which is the parent for the XMLNode, XMLInternalNode and
XMLHashTreeNode, which allows high-level methods that use the API to access the elements of the nodes
to be defined for a single type.
* Changed name of XMLNameSpace class to XMLNamespace (lower-case 's').
Version 1.97-1
* Fix for configuration in detecting existence of encoding
enumerations in R. So now encoding of strings is working again.
Version 1.97-0
* Added xmlNativeTreeParse() as an alias for xmlInternalTreeParse()
and xmlTreeParse(, useInternalNodes = TRUE).
* Assignment to attributes of an R-level XML node works again, e.g.
xmlAttrs(doc[[3]][[2]])['foo'] = "bar"
* Subsetting ([[) for XMLHashNode behaves correctly.
* Added .children parameter to addTag() function in xmlOutputDOM() objects.
* Thanks to Michael Lawrence, a significantly simpler and more
general mechanism is used for getNodeSet()/xpathApply() when
applied to a node and not a document. This allows xpath queries
that go back up the ancestor path for the node.
Version 1.96-0
* Functionality for working with XML Schema now incorporated.
* xmlSchemaValidate() function for validating a document against a schema.
* xmlSchemaValidate() using structured error handlers to give
information about line numbers, columns, domain, etc. as well as
the message.
* xmlChildren() method for XMLInternalDocument
* Recognize additional internal node types,
e.g. XMLXIncludeStartNode, ...
* foo.dtd example now uses internal and external entities for illustration.
Version 1.95-3
* configuration change to support older versions of R that do not
have the C enumeration type cetype_t defined in Rinternals.h.
Version 1.95-2
* Fix for xpathApply()/getNodeSet() on the top-level node of a document
which left the original document with no children! Found by Martin Morgan.
Version 1.95-1
* Minor bug fixes regarding Encoding issues introduce in 1.95-0.
* xmlEventParse() calls R_CheckUserInterrupt() when making callbacks to R functions
and so should make the GUI more responsive.
* Test for older versions of libxml2 which did not have a context field in the xmlNs
data structure.
Version 1.95-0
* Use the encoding of the document in creating R character strings to identify
the Encoding() in R. There are probably omissions and potential problems, so
I would be very grateful for examples which fail, along with the file, the locale
and the R code used to manipulate these.
Version 1.94-0
* Fixed a bug in xpathApply()/getNodeSet() applied to an XMLInternalNode
which now ensures that the nodes emerge with the original internal document
as their top-level document.
* Added processXInclude() for processing individual XInclude nodes
and determining what nodes they add.
* If asText is TRUE in xmlTreeParse(), xmlInternalTreeParse(), ...,
no call to file.exists() is made. This is both sensible and
overcomes a potential file name length limitation (at least on
Windows).
* The trim parameter for xmlInternalTreeParse() and
xmlTreeParse(, useInternal = TRUE) causes simple text nodes
containing blank space to be discarded. saveXML() will, by
default, put them back but not if text nodes are explicitly added.
* xmlTreeParse(), xmlInternalTreeParse(), htmlTreeParser(),
parseDTD(), etc. take an error handler function which defaults to
collecting all the errors and reporting them at the end of the
attempt to parse.
* getXMLErrors() returns a list of errors from the XML/HTML parser
for help in correcting documents.
* Added xmlStopParser() which can be used to terminate a parser from
R. This is useful in handler functions for SAX-style parsing via
xmlEventParse().
* A handler function passed to xmlEventParse() can indicate that it
wants to be passed a reference to the internal xmlParserContext by
having the class XMLParserContextFunction. Such functions will be
called with the context object as the first argument and the usual
arguments displaced by 1, e.g. the name and attributes for a
startElement handler would then be in positions 2 and 3.
* When parsing with useInternalNodes= TRUE and trim = TRUE in
xmlTreeParse() or xmlInternalTreeParse(), blank nodes are discarded
so line breaks between nodes are not returned as part of the tree.
This makes pretty-printing/indenting work on the resulting
document but does not return the exact content of the original
XML. Use trim = FALSE to preserve the breaks.
* Added xmlInternalTreeParse() which is a simple copy of xmlTreeParse()
with useInternalNodes defaulting to TRUE, so we get an internal C-level tree.
* Added an xpathSApply() function that simplifies the result to a
vector/matrix, if possible.
* Added replaceNode() function which allows one to insert an internal node
with another one.
* addChildren() has a new at parameter to specify where in the list
of children to add the new nodes.
* newXMLNode(), etc. can compute the document (doc argument) from
the parent.
* The subset operator applied to an XMLInternalDocument and
getNodeSubset() and xpathApply() compute the namespaces from the
top-level of the document by default, so, e.g., doc[["//r:init"]] work.
* section parameter added to xmlSource() to allow easy subsetting to
a particular within a document.
* added catalogLoad(), catalogAdd(), catalogClearTable() functions.
* Added docName() function for querying the file name or URL of a
parsed XML document.
* RS_XML_createDocFromNode() C routine adds root node
correctly via xmlAddChild().
* Slightly improved identification of HTML content rather than a file or URL name.
* Added a simplify parameter to the xmlNamespaceDefinition()
function which, if TRUE, returns a character vector giving the
prefix = URI pairs which can be used directly in xpathApply() and
getNodeSet().
Version 1.93-1
* Method for xmlNamespace with a character is now exported! Needed for cases that arise in
SSOAP.
Version 1.93-0
* The closeTag() function within an XMLInternalDOM object returned by xmlTree() provides
support for closing nodes by name or position in the stack of open nodes.
* xmlRoot() method for an XMLInternalDOM tree.
* Added a parent argument to the constructor functions for internal nodes, e.g. newXMLNode,
newXMLPINode, newXMLCDataNode, etc.
* doc argument for the constructor functions for internal nodes is now moved from second to third.
Calling
* Potentially changed the details about creating XML documents and nodes with namespaces. If these
negatively effect your code, please send me email (duncan@wald.ucdavis.edu).
* Enhancements and fixes for creating XML nodes and trees, especially with name spaces.
* Many minor changes to catch special cases in working with internal nodes.
Version 1.92-1
* Make addNode()/addTag() in XMLInternalDOM work with previously created XML nodes via newXMLNode().
Thanks to Seth Falcon for pointing out this omission. More improvements in the pipeline for generating
XML.
* addChildren for an XMLInternalNode can be given a list of XMLInternalNodes and/or character strings.
* xmlSource() handles r:codeIds better.
Version 1.92-0
* Added removeNodes function for unlinking XMLInternalNode objects directly by reference.
* xmlRoot() handles empty documents.
* Documentation cleanups.
Version 1.91-1
* Remove output about "cleaning"/releasing an internal document pointer.
* The warning from getNodeSet/xpathApply about using a prefix for the default namespace
now has a class/type of condition, specificall "XPathDefaultNamespace".
Version 1.91-0
* argument to add a finalizer for an XMLInternalDocument in xmlTreeParse()/htmlTreeParse() when
useInternalNodes = TRUE. If this is set, automatic garbage collection is done which will free
any sub-nodes. If you want to work with any of these nodes after the top-level tree variable
has been released, specify addFinalizer = FALSE and explicitly free the document yourself with the
free() function.
* Sme improvements on namespace prefixes in internal nodes. See newXMLNode().
* classes for additional XMLInternalNodes (e.g. XMLInternalCDataNode) now exported
* removeAttributes() has a .all argument to easily remove all the attributes within a node.
Supported for both R and internal style nodes.
* xmlAttrs<-() function for simply appending attributes to a node.
* If xmlTreeParse() is called with asText = FALSE and the file is not found, an error of class
"FileNotFound" is raised.
* [[ opertor for XMLInternalDocument to get the first/only entry in
the node set from an XPath query. This is a convenience
mechansim for accessing the element when there is only one.
Version 1.9-0
* Added xmlAncestors() functions for finding chain of parent nodes, and optionally applying a
function to each.
* xmlDoc() allows one to create a new XML document by copying an existing internal node, allowing
for work with sub-trees as regular documents, e.g. XPath queries restricted to a subset of the
tree.
* Ability to do XPath searches on sub-nodes within a document. getNodeSet() and xpathApply()
can now operate on an XMLInternalNode by creating a copy of the node and its sub-nodes into a
new document. However, these is memory leak associated with this and you should us xmlDoc()
to create a new document from the node and then perform the XPath query on that and free the
document.
Version 1.8-0
* Added xinclude argument to xmlTreeParse() and htmlTreeParse() to control whether
should be resolved and
the appropriate nodes inserted and the actual node discarded.
* The namespaces argument of getNodeSet() (and implicitly of the [ method for an
XMLInternalDocument object) can be a simple prefix name when referring to the
default namespace of the document, e.g.
getNodeSet(doc, "/r:help/r:keyword", "r")
when the document has a default namespace.
* Added a 'recursive = FALSE' parameter to xmlNamespaceDefinitions() to be able to
process all descendant nodes and so fetch the namespace definitions in an entire
sub-tree. This can be used as input to getNodeSet(), for example.
* as() method for converting an XMLInternalDocument to a node.
* xmlNamespaceDefinitions() handles the case where the top-level element
is not the first node, e.g. when there is a DOCTYPE node and/or a comment.
Version 1.7-3
* addChildren() coerces a string to an internal text node before adding the child.
Version 1.7-2
* Trivial error in free() for XMLInternalDocument objects fixed so the memory is released.
Version 1.7-1
* addition to configuration to detect whether the checked field of the xmlEntity structure is present.
Version 1.7-0
This a quite comprehensive enhancement to the facilities in the XML package. A lot of work on
the tools for creating or authoring XML from within R were added and improved. Using internal
nodes directly with newXMLNode() and friends, or using xmlTree() is probably the simplest.
But xmlHashTree() creates them in R.
* IMPORTANT: one can and should use the names .comment, .startElement, .processingInstruction,
.text, etc. when identifying general element handlers that apply to all elements of a particular type
in an XML document rather than to nodes that have a particular name. This differentiates between
a handler for a node named, say, text and a handler for all text elements found in the document.
To use this new approach, call xmlTreeParse() or xmlEventParse() with
useDotNames = TRUE
This will become the default in future releases.
* namespaceHandlers() function provided to deal with node handler functions with XML name spaces where
there may be multiple handlers for the same node name but which are in different XML name spaces.
* signature for entityDeclaration function in SAX interface is changed so that the second argument
identifies the type of entity. Also, to query the value of an entity, the C code calls the
getEntity() method of the handlers.
* addChildren() & removeChildren() and addAttributes() & removeAttributes() for an existing node allows for
post-creation modification of an XML node.
* Improved support for name spaces on node attributes.
* xmlName<-() methods for internal and R-level XML nodes to change the name of a node.
* saveXML() and as(, "character") method for XMLInternalNode objects now to create a text representation of the
internal nodes.
* xmlTree() allows for creating a top-level node in the call to xmlTree() directly and does not
ignore these arguments.
* DTD and associated DOCTYPE can be created separately or directly in xmlTree().
* xmlTree() now allows the caller to specify the doc object as an argument, including NULL
for when the nodes do not need to have a document object.
* Better support in xmlTree() for namespaces and maintaining a default/active namespace prefix that is to be
inserted on each subsequent node.
* new functions for creating different internal node types - newXMLCDataNode, newXMLPINode, newXMLCommentNode, newXMLDTDNode.
* newXMLNode() handles text, using the new newXMLTextNode() and coerce methods.
* xmlTree() supports an active/default name space prefix which is used for new nodes.
* Resetting the state of the xmlSubstituteEntities variable is handled correctly in the case of an error.
Version 1.6-4
* xmlSize() method for an XMLInternalNode.
Version 1.6-3
* Handle change from Sys.putenv() to Sys.setenv().
Version 1.6-2
* Added a URI (old) class label to the result of parseURI, and exported that class for use in
other packages (specifically SSOAP, at present).
* For subsetting child nodes by name, there is a new all = FALSE parameter which allows the caller
to get the first element(s) that matches the name(s), or all of them with, e.g.
node["bob", all = TRUE]. This allows us to avoid the equivalent idiom
node[ names(node) == "bob" ]
which is complicated when node is the result of an inline computation.
* added method for setting names on an XMLNode (names<-.XMLNode), not just for retrieving them.
Version 1.6-1
* Added catalogResolve() function for looking up local files and aliases for URIs, and
PUBLIC and SYSTEM identifiers, e.g. in DOCTYPE nodes.
* saveXML method added for XMLFlatTree. (Identified by Alberto Monteiro.)
* Fixed saveXML methods for various classes.
* Doctype class: added validity method, improved coercion to character, and slightly more flexible
constructor function. Validates PUBLIC identifier.
Version 1.6-0
* In saveXML() method for XMLInternalDocument, we "support" the encoding argument by passing it to
xmlDocDumpFormatMemoryEnc() or xmlSaveFileEnc() in the libxml2 C code.
We could also use the xmlSave() API of libxml2.
* htmlTreeParse() supports an encoding argument, e.g. htmlTreeParse("9003.html", encoding = "UTF-8").
This allows one to correctly process HTML documents that do not contain their encoding information in the
tag.
The argument is also present in xmlTreeParse() but currently ignored.
Version 1.5-1
* updated documentation for the alias for free method for XMLInternalDocument.
Version 1.5-0
* added free() generic function and method for XMLInternalDocument
Version 1.4-2
* xmlTreeParse and htmlTreeParse will accept a character vector of length > 1
and treat it as the contents of the XML stream and so call
paste(file, collapse = "\n") before parsing. The asText = TRUE is implied.
Thanks to Ingo Feinerer for prompting this addition.
Version 1.4-1
* Fix to ensure a connection is closed in saveXML. Identified by Herve Pages
* Update definition and documentation for xmlAttrs to take ... arguments.
Version 1.4-0
* Added fullNamespaceInfo parameter for xmlTreeParse() which, if TRUE,
provides the namespace for each node as a named character vector giving
the URI of the namespace and the prefix as the element name, i.e. c(prefix = uri)
The default is FALSE to preserve the earlier behavior. The namespace object
has a class XMLNamespacePrefix for the old-style, and XMLNamespace for the new
style with c(name = uri) form.
This information makes comparing namespaces a lot simpler, e.g. in SOAP.
Version 1.3-2
Mainly fixes for internal nodes.
* Export XMLNode, XMLInternalNode, XMLInternalElementNode classes
* as() method for XMLInternalNode wasn't recognized properly because
the classes weren't exported.
Also, the internal function asRXMLNode() accepts trim and ignoreBlanks
arguments for cleaning up the XML node text elements that are created.
* export coerce methods.
Version 1.3-1
* parseURI() sets the port to NA if the value is 0.
Version 1.3-0
* The SAX parser now has a branches argument that identifies XML elements
which are to be built into (internal) nodes and then the sub-tree/node
is passed to the handler function specified in the element of the branches
argument. This mixes the efficient SAX event-driven parsing with the easier
programming tree-based model, i.e. DOM.
* XMLInternalNode objects in R now have extra class information identifying them as
as regular element, text, CDATA, PI, ...
Version 1.2-0
* names() method for XMLInternalNode
* [ method for XMLInternalDocument and string using XPath notation.
* getNodeSet() has support for default namespaces in the XML document.
It is available, by default, to the XPath expression with the prefix 'd'.
* Exported xmlNamespace() method for XMLInternalNode.
* xmlNamespaceDefinitions() made generic (S3) and new method for
XMLInternalNode class.
Version 1.1-1
* Change to handling entities in printing of regular R-level XML text nodes
created during xmlTreeParse() call. Identified by Ingo Feinerer.
* saveXML for an XMLNode object will take a file name and write to the corresponding
file, overwriting it if it already exists.
Version 1.1-0
* xpathApply and getNodeSet take functions to be applied to nodes in a node
set resulting from an XPath query.
Version 1.0-0
* Version skipped as it is not a milestone release, just ran out of numbers!
Version 0.99-94
Changes from Russell Almond and suggestions from Franck Giolat for creating XML in R
* xmlNode() puts the names on children if omitted. Caller can use names other
than the XML element name (but this is not necessarily advisable).
* Added xmlChildren() method to set the children.
* Printing of an XML node to the console handles empty nodes and text nodes better.
* xmlTextNode() will replace reserved characters with their entity equivalent, e.g.
& with & and < with <. One can specify the entity vector including providing
an empty one should one want to avoid replacement.
Version 0.99-93
Changes from Martin Morgan
* import normalizePath from utils.
* Changes to configure.win to find 3rd party DLLs in bin/ directory, not lib/
Version 0.99-92
* Fix for setting DTD entity field uncovered by the strict type checking in R internals.
Version 0.99-91
* Added an encoding argument to saveXML(), initially for use in the Sxslt package.
Version 0.99-9
* Example of using namespaces in getNodeSet()
* Examples for xmlHashTree().
Version 0.99-8
* Introduced initial version of flat trees for storing the DOM in a
non-hierarchical data structure in R. This allows us to work with
a mutable tree and to perform certain operations across all the
nodes more efficiently, i.e. non-recursively. Importantly, one
can find the parent node of a given node in the tree which is not
possible with the list of list approach. It does mean more
computation for some common operations, specifically parsing.
Indeed, it can be 25 times slower for a non-trivial file, i.e. one
with. However, for a file with 7700 nodes, it still only takes 2
1/2 seconds. So there is a trade-off. While there are a few
versions in the code, xmlHashTree() is the one to use for speed
reasons. xmlFlatListTree() is another and xmlFlatTree() is
excruciatingly slow. See tests/timings.R for some comparisons.
xmlGetElementsByTagName and other facilities work on these types
of trees.
More functions and methods can and should be provided to work with
these trees if they turn out to be used in any significant way.
* add the R attribute 'namespaces' to an XML node's attributes
vector so that one can differentiate between conflicting attribute
names with different namespaces.
* added parseURI() to return the elements of a URI from a string.
Version 0.99-7
* Example of reading HTML tables using XPath and internal nodes in bondsTables.R
* Some additional methods for XMLInternalNode.
Version 0.99-6
* configure does not require the GNU sed, but can use any version of sed now that the
use of + in the regular expression has been removed.
Version 0.99-5
* Added append.XMLNode and append.xmlNode to the exported symbols from the NAMESPACE
file.
Version 0.99-4
* Fix for addComment() in xmlOutputDOM().
* Removed all the compilation warnings about interchanging xmlChar* and char*.
Version 0.99-3
* Added support in print methods for XML objects for indent = FALSE,
and tagSeparator, which defaults to "\n". These can be used to print
a faithful representation of an original XML document, but only when
used in combination with
xmlTreeParse( skipBlanks = FALSE, trim = FALSE)
Version 0.99-2
* Problems compiling with libxml2-2.5.11 and libxml2-2.6.{1,2}, so
we now test for a recent version of libxml. The test uses sed -r
which may cause problems. If one really wants to avoid the tests
set the environment variable FORCE_XML2 to any value before running
R CMD INSTALL XML.
* Documentation for getNodeSet() didn't refer to the new namespaces argument.
Version 0.99-1
* getNodeSet() takes a namespaces argument which is named character vector of
prefix = URI pairs of namespaces used in the XPath expression.
* Handlers for xmlEventParse() can include startDocument and endDocument elements
to catch those particular events. Useful for closing connections and general cleanup,
especially in the "pull" data source, i.e. connections or functions.
* xmlEventParse() when called with a function as the data source now doesn't have
a new line appended to each string returned to the parser by the function.
* Passing a connection to xmlEventParse() now uses a regular R function to call
readLines(con, 1) and no longer does this via C code to call readLines().
* Fix to the example in xmlEventParse() using the state variable.
Version 0.99-0
* Implementation for the endElement in the xmlEventParse() for saxVersion == 2.
* In xmlEventParse( , saxVersion = 2), the namespaces come as a named vector
in the fourth argument.
Version 0.98-1
* Messages from errors are now more informative. Using saxVersion = 2 in xmlEventParse(), you get
get the line and column information about the error.
Version 0.98
* Added saxVersion parameter to xmlEventParse() to control which interface is used at the C level.
This changes the arguments to the startElement handler, adding the namespace for the
element.
* Added xmlValidity() function to set the value of the default validity action. This allows us to do the
setting in the R code. This is currently not exported.
* Added recursive parameter to xmlElementsByTagName() function. This provides functionality
similar to getElementsByTagName() in XML parsing APIs for other languages.
* xmlTreeParse() called with no handlers and useInternalNodes returns a reference to the
C-level xmlDocPtr instance. This is an object of class "XMLInternalDocument". This can be
used in much the same way as the regular "XMLDocument" tree returned by xmlTreeParse,
e.g. xmlRoot, etc.
* Added getNodeSet() to evaluate XPath expressions on an XMLInternalDocument object.
* Added a validate parameter to the xmlEventParse() function.
Version 0.97-8
* Fix error where CDATA nodes and potentially other types of nodes (without element names) were being
omitted from the R tree in a simple call to xmlTreeParse("filename") (i.e. with no handlers).
Version 0.97-7
* Documentation updates.
Version 0.97-6
* useInternalNodes added to xmlTreeParse() and htmlTreeParse().
This allows one to avoid the overhead of converting the contents of nodes to
R objects for each handler function call. Also, can access parents, siblings,
etc. from within a handler function.
* Included parameterizations for Windows from Uwe Ligges to aid automated-building
and finding the libxml DLL at run time.
Version 0.97-5
* Methods for accessing component of XMLInternalDocument and XMLInternalNode objects,
e.g. xmlName, xmlNamespace, xmlAttrs, xmlChildren
* saveXML.XMLInternalDOM now supports specification of a Doctype (see Doctype).
* saveXML uses NextMethod and arguments are transferred. Identified by Vincent Carey.
* Suppress warnings from R CMD check.
* Change of the output file in saveXML() example to avoid conflict with Microsoft
Windows use of name con.xml.
Version 0.97-4
* Quote URI values in namespace definitions in print.XMLNode.
Version 0.97-3
* Added a method for xmlRoot for HTMLDocument
* Changed the maintainer email address.
Version 0.97-2
* Added cdata to the collection of functions that are used in the handlers
for xmlEventParse(). Omission identified by Jeff Gentry.
* Fixed the maintainer email address to duncan@wald.ucdavis.edu
Version 0.97-1
* Put the correct S3method declarations in the NAMESPACE.
Version 0.97-0
* Using a NAMESPACE for the package
Version 0.96-0
* Using libxml2 by default rather than libxml.
* Fixed typo. in PACKAGE when initializing the library.
Version 0.95-7
* When creating a namespace identifier, if the namespace doesn't have an href, then we put
in an string.
Version 0.95-6
* Documentation updates for synchronization with the code.
Version 0.95-5
* Trivial bug of including extra arguments in call to UseMethod for
dtdElementValidEntry that generated warnings.
Version 0.95-4
* Configuration now tries to find libxml 1, then libxml 2 unless explicitly
instructed to find libxml 2 via --with-libxml2. So the change is to pick
up libxml 2 if libxml 1 is not found rather than signal an error.
Version 0.95-3
* Remove the need to define xmlParserError. Instead, set the value of the error
routine/function pointer to our error handler in the different default handlers
in libxml. We now initialize these default objects when we load the library.
* When setting the environment variables LIBXML_INCDIR and LIBXML_LIBDIR, one
needs to specify the -I and -L prefixes for the compiler and linker respectively
in front of directory names.
* Detect whether the routine for xmlHashScan (in libxml2) provides a return value
or not. This changed in version 2.4.21 of libxml2.
Version 0.95-2
* Configuration detects Darwin and handles multiplicity of xmlParserError
symbol.
Version 0.95-1
* Configuration now supports the specification of the xml-config script
to use via the environment variable XML_CONFIG or the --with-xml-config
as in --with-xml-config=xml2-config
* Recognize file:/// prefix as URL and not switch to treating file name as
XML text.
Version 0.95-0
* Event-driven parsing (SAX) can take a connection object or a function
that is called when the parser needs more input. See the documentation
for xmlEventParse().
* Classes and methods explicitly created during the installation.
This will cause problems with namespaces until the saving of the image
model works with namespaces.
Version 0.94-1
* Minor change to configuration script to avoid -L-L in specification of
directory for XML library (libxml).
Version 0.94-0
* Use registration of C routines
* Added methods for saveXML for XMLNode and XMLOutputStream objects.
Version 0.93-4
* replaceEntities argument for xmlEventParse.
* S4 SAX methods assigned to the correct database.
Version 0.93-3
* Correct support for DTDs and namespaces in the internal nodes
used in xmlTree(). Errors identified by Vincent Carey.
Version 0.93-2
* Bug in trimming white space discovered by Ott Toomet.
Version 0.93-1
* Documentation updates. Included xmlGetAttr.Rd.
Version 0.93-0
* Added toString.XMLNode
* Fixed the printing of degenerate namespaces in an XML node,
i.e. the spurious `:'.
Version 0.92-2
* Fixed C bug caused by using namespace without a suffix,
e.g. xmlns="http:...." assumed prefix was present.
Thanks to David Meyer.
Version 0.92-1
* Display the namespace definitions when printing an XMLNode object.
* New addAttributeNamespaces argument for xmlTreeParse() that controls whether
namespaces are included in attribute names.
Version 0.92-0
* XMLNode class now contains a field for namespace definitions
The `namespace' field is a character string identifying the prefix's
namespace. The `namespaceDefinition' field contains the full definitions
of each of the namespaces defined within a node.
* Printing of XLM nodes displays the namespace.
* xmlName() takes a `full' argument that controls whether the
namespace prefix is prepended to the tag name.
Version 0.91-0
* Added a mechanism to the SAX parser to allow a state object
be passed between the callbacks and returned as the result of
the parsing. This avoids the need for closures. Also, works
with S4 classes and the genericSAXHandlers() methods by allowing
one to write methods for these generic callbacks that dispatch
based on the type of the state object.
* Fix to make work properly with S4 class system.
Version 0.9-1
* Formatting of the help files to avoid long lines
identified by Ott Toomet
* Addition of `ignoreComments' argument for xmlValue()
* Date in the DESCRIPTION file corrected (thanks to Doug Bates).
Version 0.9-0
* Added addCData() and addPI() to the handlers of the different
XMLOutputStream classes.
Code for XMLInternalDOM (i.e. xmlTree()) from Byron Ellis.
* print() method for XMLProcessingInstruction node has the terminating `?'
as in .
Version 0.8-2
* Changes to support libxml2-2.4.21 (specifically the issues with
the headers and parse error regarding xmlValidCtxt). Thanks to
Wolfgang Huber for identifying this.
* Ignoring R_VERSION now, so dependency is R >= 1.2.0
Version 0.8-1
* Added an `attrs' argument to the xmlOutputBuffer and xmlTree
functions for specifying the top-level node.
Version 0.8-0
* xmlValue() extended to work recursively if a node has
only one child.
* T and F replaced by TRUE and FALSE
Version 0.7-4
* Support for Windows
Version 0.7-3
* Documents without are handled correctly.
* Configuration tweak to set LD_LIBRARY_PATH to handle the case
that the user specifies LIBXML_LIBDIR and it is needed to run the
version test.
* Keyword XML changed to IO.
Version 0.7-2
* Fix for printing XMLNode objects to handle comments and elements
with name "text". Identified by Andrew Schuh.
Version 0.7-1
* Minor fixes for passing R CMD check.
Version 0.7-0
* Generating XML trees using internal libxml structures:
xmlTree(), newXMLDoc(), newXMLNode(), saveXML().
* Support parsing HTML (htmlTreeParse()) using DOM.
Suggestion from Luis Torgo.
* Additional updates for libxml2, relating to DTDs.
Version 0.6-3
* Installation using --with-xml2 now attempts to link against libxml2.so
and the appropriate header files.
* Use libxml's xml-config or xml2-config scripts if these are available.
Version 0.6
* xmlDOMApply for recursively applying a function to each node in a tree.
Version 0.5-1
* simplification of xmlOutputBuffer so that it doesn't put
the namespace definition in each and every tag.
* configuration changes to support libxml2-2.3.6
(look for libxml2, check if xmlHashSize is available)
* now dropping nodes if the handler function returns NULL.
Updated documentation.
* spelling correction in the documentation
Version 0.5
* xmlOutputBuffer now accepts a connection.
* Fixes for using libxml2, specifically 2.2.12.
Also works for libxml2.2.8
* Enhanced configuration script to determine what features are available.
Version 0.4
* `namespace' handler in xmlTreeParse is called when a namespace
declaration is encountered. This is called before the child nodes
are processed.
* More documentation, in Tour.
* xmlValue, xmlApply, xmlSApply, xmlRoot, xmlNamespace, length, names
* Constructors for different types of nodes: XMLNode, XMLTextNode, XMLProcessingInstruction.
* Methods for print(), subsetting ([ and [[), accessing the fields
in an XMLNode object.
* New classes for the different node types (e.g. XMLTextNode)
* Event driven parsing available via libxml. Expat is not needed but
can be used.
* Document sources can be URLs (ftp and http) when using the libxml parser.
* Examples for processing MathML and SVG files. See examples/ directory.
* Examples for event driven parsing.
* Class of result from xmlTreeParse is XMLDocument.
* Comments, Entities, Text, etc. inherit from XMLNode
in addition to defining their own XML class.
XML/NAMESPACE 0000644 0001751 0000144 00000024370 14405636156 012167 0 ustar hornik users # Avoided if we are using Windows so that we can specify the directory to find the libxml DLL.
# useDynLib(XML)
importFrom(utils, menu)
importFrom(grDevices, dev.off, jpeg, pdf, png)
import(methods)
export(
append.xmlNode,
append.XMLNode,
Doctype,
asXMLNode,
comment.SAX,
dtdElement,
dtdElementValidEntry,
dtdEntity,
dtdIsAttribute,
dtdValidElement,
endElement.SAX,
entityDeclaration.SAX,
genericSAXHandlers,
getNodeSet,
xpathApply,
htmlTreeParse,
htmlParse,
libxmlVersion,
xmlDoc,
newHTMLDoc,
newXMLDoc,
newXMLNode,
newXMLNamespace,
newXMLPINode,
newXMLTextNode,
newXMLCommentNode,
newXMLCDataNode,
newXMLDTDNode,
parseDTD,
processingInstruction.SAX,
saveXML,
startElement.SAX,
supportsExpat,
supportsLibxml,
text.SAX,
toString.XMLNode,
xmlApply,
xmlAttributeType,
xmlAttrs,
"xmlAttrs<-",
xmlCDataNode,
xmlChildren,
xmlCommentNode,
xmlContainsElement,
xmlContainsEntity,
xmlDOMApply,
xmlElementsByTagName,
xmlEventHandler,
xmlEventParse,
# new.xmlEventParse,
# new.xmlTreeParse,
xmlGetAttr,
xmlHandler,
xmlName,
xmlNamespace,
xmlNode,
xmlOutputBuffer,
xmlOutputDOM,
xmlPINode,
xmlParent,
xmlAncestors,
xmlRoot,
xmlSApply,
xmlSize,
xmlSize.default,
xmlTextNode,
xmlTree,
xmlTreeParse,
xmlInternalTreeParse,
xmlNativeTreeParse,
xmlParse,
xmlValue,
names.XMLNode,
parseURI,
asXMLTreeNode,
xmlHashTree,
addNode,
xmlNamespaceDefinitions,
xmlNamespaces,
matchNamespaces,
getDefaultNamespace,
catalogResolve,
toHTML,
addChildren,
removeChildren,
removeNodes,
addAttributes,
removeAttributes,
"xmlName<-",
addSibling
# xmlFlatListTree
)
S3method(removeNodes, "list")
S3method(removeNodes, "XMLNodeSet")
S3method(removeNodes, "XMLNodeList")
S3method(removeNodes, "XMLInternalNode")
exportMethods("addAttributes", "removeAttributes")
exportMethods("toHTML")
export("xmlChildren<-")
exportMethods("xmlChildren<-")
exportClasses("XMLInternalDocument", "XMLAbstractDocument")
exportClass("URI")
if(TRUE) {
exportClasses("XMLAbstractNode", "XMLNode")
exportClasses("HTMLInternalDocument")
exportClasses("XMLInternalNode", "XMLInternalElementNode",
"XMLInternalTextNode", "XMLInternalPINode", "XMLInternalCDataNode",
"XMLInternalCommentNode", "XMLDTDNode",
"XMLXIncludeStartNode", "XMLXIncludeEndNode", "XMLEntityDeclNode",
"XMLAttributeDeclNode", "XMLDocumentNode", "XMLDocumentTypeNode",
"XMLDocumentFragNode", "XMLNamespaceDeclNode")
exportClass("XMLTreeNode")
exportClass(XMLNamespace)
exportClass(XMLNamespaceDefinitions)
}
exportMethods("coerce")
exportMethods("free")
S3method(removeChildren, XMLNode)
S3method(removeChildren, XMLInternalNode)
exportClasses(Percent, FormattedNumber, FormattedInteger)
S3method(xpathApply, XMLInternalNode)
S3method(xpathApply, XMLInternalDocument)
S3method(xpathApply, XMLNode)
export(xpathSApply)
S3method(xmlNamespaceDefinitions, XMLNode)
S3method(xmlNamespaceDefinitions, XMLInternalDocument)
S3method(xmlNamespaceDefinitions, XMLInternalNode)
S3method(xmlNamespaceDefinitions, XMLAbstractDocument)
#XXX S3method(xmlNamespaceDefinitions, XMLHashTreeNode)
#S3method(names, XMLFlatTree)
#S3method("$", XMLFlatListTree)
S3method(addNode, XMLHashTree)
S3method(xmlRoot, XMLHashTree)
S3method(print, XMLHashTree)
S3method(print, XMLInternalDocument)
S3method(print, XMLInternalNode)
S3method(print, XMLRDocument)
S3method(xmlRoot, XMLRDocument)
S3method(xmlRoot, HTMLDocument)
if(TRUE) {
S3method(xmlName, XMLComment)
S3method(xmlName, XMLNode)
S3method(xmlName, XMLInternalNode)
} else
# S4 version
exportMethods(xmlName)
S3method("names<-", XMLNode)
S3method("xmlName<-", XMLNode)
S3method("xmlName<-", XMLInternalElementNode)
exportMethods("xmlAttrs<-")
if(TRUE) {
S3method(xmlChildren, XMLTreeNode)
S3method(xmlChildren, XMLInternalDocument)
S3method(xmlChildren, XMLHashTreeNode)
S3method(xmlChildren, XMLNode)
S3method(xmlChildren, XMLInternalNode)
} else
exportMethods("xmlChildren")
if(FALSE) {
S3method(xmlParent, XMLTreeNode)
S3method(xmlParent, XMLHashTreeNode)
S3method(xmlParent, XMLInternalNode)
} else
exportMethods(xmlParent)
S3method(xmlSize, XMLHashTreeNode)
S3method(xmlSize, XMLHashTree)
S3method(xmlRoot, XMLHashTree)
S3method(xmlRoot, XMLInternalDOM)
S3method(xmlRoot, XMLInternalNode)
S3method(addChildren, XMLInternalNode)
S3method(addChildren, XMLInternalDocument)
S3method(addChildren, XMLNode)
export(replaceNodes)
S3method(replaceNodes, XMLInternalNode)
S3method(xmlSize, XMLInternalNode)
S3method(xmlValue, XMLInternalNode)
S3method(xmlValue, XMLNodeSet)
S3method(xmlValue, list)
#exportS3method("xmlValue", "NULL")
S3method("xmlValue", "NULL")
S3method("[", XMLNode)
if(TRUE) {
S3method("[[", XMLNode)
S3method("[[", XMLDocumentContent)
S3method("[[", XMLInternalNode)
S3method("[[", XMLInternalDocument)
S3method("[[", XMLHashTreeNode)
S3method("[[", XMLInternalElementNode)
}
S3method("[", XMLInternalNode)
S3method("[", XMLInternalDocument)
S3method("names", XMLInternalNode)
S3method("[<-", XMLNode)
S3method("[[<-", XMLNode)
S3method("[[<-", XMLInternalNode)
exportClass("XMLAttributes")
exportMethods("[")
export(xmlNamespaceDefinitions)
S3method(names, XMLNode)
S3method(length, XMLNode)
if(TRUE) {
S3method(xmlAttrs, XMLNode)
S3method(xmlAttrs, XMLInternalNode)
S3method(xmlAttrs, XMLElementDef)
} else
exportMethods("xmlAttrs")
S3method(xmlSize, XMLDocument)
S3method(xmlSize, default)
S3method(xmlSize, XMLNode)
S3method(print, XMLNode)
S3method(print, XMLTextNode)
S3method(print, XMLComment)
S3method(print, XMLCommentNode)
S3method(print, XMLEntityRef)
S3method(print, XMLCDataNode)
S3method(print, XMLProcessingInstruction)
S3method(xmlRoot, XMLDocument)
S3method(xmlRoot, XMLInternalDocument)
S3method(xmlRoot, XMLDocumentContent)
S3method(xmlApply, XMLNode)
S3method(xmlApply, XMLDocument)
S3method(xmlApply, XMLDocumentContent)
S3method(xmlApply, XMLInternalNode)
S3method(xmlSApply, XMLNode)
S3method(xmlSApply, XMLDocument)
S3method(xmlSApply, XMLDocumentContent)
S3method(xmlSApply, XMLInternalNode)
S3method(xmlSApply, XMLNodeSet)
S3method(xmlApply, XMLNodeSet)
if(TRUE) {
S3method(xmlValue, XMLNode)
S3method(xmlValue, XMLTextNode)
S3method(xmlValue, XMLComment)
S3method(xmlValue, XMLCDataNode)
S3method(xmlValue, XMLProcessingInstruction)
} else
exportMethods("xmlValue")
S3method(addSibling, XMLInternalNode)
S3method(xmlNamespace, XMLNode)
S3method(xmlNamespace, XMLInternalNode)
S3method(xmlNamespace, character)
## formerly an unexported generic append() with uexported copy append.xmlNode()
S3method(append.xmlNode, XMLNode, append.XMLNode)
S3method(append.xmlNode, default)
exportMethods(saveXML)
# S3method(saveXML, XMLInternalDocument)
# S3method(saveXML, XMLInternalDOM)
# S3method(saveXML, XMLInternalNode)
# S3method(saveXML, XMLOutputStream)
# S3method(saveXML, XMLNode)
# S3method(saveXML, XMLFlatTree)
S3method(dtdElementValidEntry, XMLElementDef)
S3method(dtdElementValidEntry, XMLOrContent)
S3method(dtdElementValidEntry, XMLElementContent)
S3method(dtdElementValidEntry, character)
S3method(dtdElementValidEntry, XMLSequenceContent)
export(docName)
if(FALSE) {
S3method(docName, XMLDocument)
S3method(docName, XMLDocumentContent)
S3method(docName, XMLInternalDocument)
S3method(docName, XMLInternalNode)
S3method(docName, XMLHashTree)
} else
exportMethods(docName)
export("xmlNamespaces<-")
exportMethods("xmlNamespaces<-")
export("docName<-")
exportMethods("docName<-")
exportClass("SAXState")
export(xmlSource)
exportMethods("xmlSource")
export(xmlSourceFunctions)
exportMethods("xmlSourceFunctions")
export(xmlSourceSection)
exportMethods("xmlSourceSection")
# Not yet exported....
#xmlValidity
if(TRUE) {
exportClasses("ExternalReference", "xmlSchemaRef", "libxmlTypeTable")
exportClasses("SchemaElementTable", "xmlSchemaElementRef",
"SchemaTypeTable", "xmlSchemaTypeRef",
"SchemaAttributeTable", "xmlSchemaAttributeRef",
"SchemaAttributeGroupTable", "xmlSchemaAttributeGroupRef",
"SchemaNotationTable", "xmlSchemaNotationRef")
export(xmlSchemaValidate, schemaValidationErrorHandler, xmlSchemaParse)
exportMethods("names", "$", "$<-", "coerce")
}
#importFrom(utils, normalizePath)
export(getSibling)
S3method(getSibling, XMLInternalNode)
S3method(getSibling, XMLHashTreeNode)
export(catalogLoad, catalogClearTable, catalogAdd, catalogDump)
export(xmlStructuredStop, xmlErrorCumulator) # xmlStop
export(xmlStopParser)
export(getXMLErrors)
export(processXInclude)
S3method(processXInclude, list)
S3method(processXInclude, XMLInternalDocument)
S3method(processXInclude, XMLInternalElementNode)
exportMethods(show)
export(xmlElementSummary)
#, xmlElementSummaryHandlers)
#export(xmlNodeMatch)
#export(getRCode)
export(xmlParserContextFunction)
export(getRelativeURL)
export(xmlToList)
export('xmlValue<-')
exportMethods('xmlValue<-')
export(getEncoding)
exportMethods(getEncoding)
exportClass("XMLCodeFile")
exportClass("XMLCodeDoc")
exportMethods("[[")
export(xmlCodeFile)
exportMethods(source)
export(xmlClone)
export(findXInclude)
export(getLineNumber, getNodeLocation, getNodePosition)
export(ensureNamespace)
export(removeXMLNamespaces)
exportMethods(removeXMLNamespaces)
export(xmlParseDoc)
export(RECOVER,
NOENT,
DTDLOAD,
DTDATTR,
DTDVALID,
NOERROR,
NOWARNING,
PEDANTIC,
NOBLANKS,
SAX1,
XINCLUDE,
NONET,
NODICT,
NSCLEAN,
NOCDATA,
NOXINCNODE,
COMPACT,
OLD10,
NOBASEFIX,
HUGE,
OLDSAX)
export(libxmlFeatures)
exportClass("XMLString")
export(xml, xmlParseString, isXMLString)
export(readHTMLTable)
export(xmlToS4, makeClassTemplate) # xmlToS4List
exportMethods("xmlToS4")
export(xmlToDataFrame)
exportMethods(xmlToDataFrame)
export(compareXMLDocs)
S3method(summary, XMLInternalDocument)
export(parseXMLAndAdd)
#exportClass("XPathNodeSet")
export(xmlSerializeHook, xmlDeserializeHook)
if(FALSE) {
export(clearMemoryManagement)
exportMethods(clearMemoryManagement)
}
export("xmlParent<-")
export(xmlSearchNs)
export("xmlNamespace<-", setXMLNamespace)
export(readKeyValueDB, readSolrDoc)
export(getChildrenStrings)
export(getHTMLLinks)
export(readHTMLList)
export(getXIncludes, xmlXIncludes)
export(getHTMLExternalFiles)
export(xmlCleanNamespaces)
export(replaceNodeWithChildren)
S3method(toString, XMLNode)
XML/LICENSE 0000644 0001751 0000144 00000000234 14516227653 011747 0 ustar hornik users YEAR: 2015
COPYRIGHT HOLDER: Duncan Temple Lang, Bell Labs, Lucent Technologies, University of California, Davis; CRAN Team
ORGANIZATION: copyright holder
XML/configure.ac 0000644 0001751 0000144 00000057362 14120536300 013225 0 ustar hornik users # An input file for autoconf to configure
# the XML parsing facilities for both R and S.
# Currently this works for R.
#
# This was originally implemented by Friedrich Leisch
# with modifications for subsequent versions
# by Duncan Temple Lang.
#
AC_INIT
AC_CONFIG_SRCDIR([DESCRIPTION])
dnl The different command line arguments for configure.
dnl They can also be specified by setting environment variables.
dnl
dnl Establish the command line arguments accepted by this script.
dnl
dnl
dnl whether to use Splus.
AC_ARG_WITH(splus,[ --with-splus Compile as an SPlus library (rather than R). Value can be the (fully qualified) name of the Splus script.], USE_SPLUS=1)
dnl explicitly force the use of the old names. If this doesn't match the header
dnl files that are actually found, then
AC_ARG_WITH(oldlibxml,[], USE_OLD_ROOT_CHILD_NAMES=1; FORCE_OLD=1;echo "Using old libxml names")
dnl tell the configuration that we are using libxml2.
AC_ARG_WITH(libxml2,[ --with-libxml2 indicate that the libxml version is 2.0 or higher],
[ if test "${withval}" = "yes" ; then
LIBXML2="-DLIBXML2=1";
USE_XML2="yes" ;
fi], USE_XML2="yes")
AC_ARG_WITH(xml-config,[ --with-xml-config the name of the xml-config program to use.],
[ XML_CONFIG=${withval}])
dnl compile for use with libxml. This is the default.
AC_ARG_WITH(libxml, [ --with-libxml use the libxml library (default)],
[ if test "${withval}" = no; then
USE_LIBXML=false;
else
USE_LIBXML=true;
fi],
USE_LIBXML=true)
# Default is false for expat since we can
# do event driven parsing with libxml.
AC_ARG_WITH(expat,
[ --with-expat use expat library (off by default)],
[ if test "${withval}" = no; then
USE_EXPAT= ;
else
USE_EXPAT=true;
fi],
USE_EXPAT= )
dnl Here we add a flag which we will use below in the case that
dnl the user declaratively adds this option in.
AC_ARG_WITH(xml_output_buffer,
[ --with-xml-output-buffer use ADD_XML_OUTPUT_BUFFER_CODE (conditionally on)],
[ if test "${withval}" = "yes" ; then
ADD_XML_OUTPUT_BUFFER="yes";
else
ADD_XML_OUTPUT_BUFFER="no";
fi],
ADD_XML_OUTPUT_BUFFER=no)
dnl
dnl End of command line argument declarations.
dnl Now compute the relevant settings.
dnl
dnl Get the C compiler, including any values set by the user
dnl We need this to perform compilation and link checks.
AC_PROG_CC
AC_PROG_CPP
dnl ======================================================
dnl Check whether we are compiling this for use with SPlus
dnl and if so, figure out which version.
if test -n "${USE_SPLUS}" ; then
# Allows the user to say --with-splus=/usr/local/bin/Splus5
# This could be fooled, but unlikely unless the user does something
# "clever"
if test -x ${with_splus} ; then
SPLUS=${with_splus}
else
SPLUS=Splus
fi
# Get the major version of the Splus being run.
dnl Shouldn't this be perl -epn
SPLUS_VERSION=`echo 'cat(version$major,"\n",sep="")' | ${SPLUS} | perl -e 'while(){ $x = $_;} printf $x;'`
# If this is version 3, we are in trouble.
if test ${SPLUS_VERSION} -lt 5 ; then
echo "This package does not work with SPlus 3, but only SPlus 5 and 6"
exit 1
fi
fi # end of USE_SPLUS.
AC_ARG_WITH(xmlsec, [ --with-xmlsec add support (experimental) for XML security with xmlsec. Specify no, xmlsec1 or xmlsec1-openssl],
[ if test "${withval}" = no; then
USE_XMLSEC=false
else
USE_XMLSEC=${withval}
fi],
USE_XMLSEC=true)
dnl
dnl Redirection: http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-3.html
if test -n "" ; then
# turned off for now
#XXX get the redirection correct
"$R_HOME/bin/R" CMD SHLIB testRemoveFinalizers.c &> AS_MESSAGE_FD
if test "$?" = 0 ; then
$R_HOME/bin/R --no-echo --vanilla < testRemoveFinalizers.R &> AS_MESSAGE_FD
fi
if ! test "$?" = 0 ; then
echo "No ability to remove finalizers on externalptr objects in this verison of R";
EXPORT_MEMORY_MANAGEMENT="FALSE"
else
echo "Have R_RemoveExtPtrWeakRef" ;
PKG_CPPFLAGS="$PKG_CPPFLAGS -DR_HAS_REMOVE_FINALIZERS=1";
EXPORT_MEMORY_MANAGEMENT="TRUE"
fi
else
EXPORT_MEMORY_MANAGEMENT="FALSE"
fi
dnl ======================================================
AC_PATH_PROGS(SED, sed)
AC_PATH_PROGS(PKG_CONFIG, pkg-config)
dnl In this section, we try to find the different
dnl characteristics of the libxml library.
dnl We are looking to see if
dnl a) it is version 1.8.* or version 2.*
dnl b) whether it is installed with include files in /gnome-xml
dnl or in a private, uninstalled form in which case the include
dnl directory is usually libxml/
if test -n "${USE_LIBXML}" ; then
dnl If the user gave us xml*-config, then use that.
if test -n "${XML_CONFIG}" && test -x "${XML_CONFIG}" ; then
USING_USER_XML_CONFIG="yes"
XML_VERSION="`${XML_CONFIG} --version | ${SED} -e 's/\..*//g'`"
if test "${XML_VERSION}" = "2" ; then
USE_XML2="yes"
LIBXML2="-DLIBXML2=1";
fi
echo "User defined xml-config: ${XML_CONFIG}, XML Version: ${XML_VERSION}, XML2: ${USE_XML2}"
fi
LANGUAGE_DEFS="${LANGUAGE_DEFS} -DHAVE_VALIDITY=1"
dnl if the user has not specified anything about libxml,
dnl then lets look for xml-config. We let the user give this
dnl as an environment variable `XML_CONFIG'.
if test -z "${LIBXML_INCDIR}" && test -z "${LIBXML_LIBDIR}" ; then
dnl find xml*-config
dnl If they ask explicitly for xml2, find it or fail otherwise.
if test "${USE_XML2}" = "yes" ; then
if test -z "${XML_CONFIG}" ; then
AC_PATH_PROGS(XML_CONFIG, xml2-config)
if test -z "${XML_CONFIG}" ; then
echo "Cannot find xml2-config"
exit 1
fi
fi
fi
dnl Otherwise, if they implicitly ask for xml-config
dnl find that.
if test -z "${XML_CONFIG}" ; then
AC_PATH_PROGS(XML_CONFIG, xml-config)
fi
dnl and if they don't have libxml version 1, see if they
dnl have libxml2
if test "${USE_XML2}" = "maybe" ; then
if test -z "${XML_CONFIG}" ; then
AC_PATH_PROGS(XML_CONFIG, xml2-config)
if test -z "${XML_CONFIG}" ; then
echo "Cannot find xml2-config"
exit 1
else
echo "Using libxml version `$XML_CONFIG --version`"
fi
fi
fi
if test -n "${XML_CONFIG}" ; then
echo "USE_XML2 = ${USE_XML2}"
if test "${USE_XML2}" != "no" && test -z "${FORCE_XML2}"; then
dnl This is not needed now, but is a way to test whether we should use -E or -r to get
dnl extended regular expression usage with this version of sed.
echo "foo" | sed -Ee 's/foo/bar/' > /dev/null 2>&1
if test "$?" = "0" ; then
SED_EXTENDED_ARG="-E"
else
SED_EXTENDED_ARG="-r"
fi
echo "SED_EXTENDED_ARG: ${SED_EXTENDED_ARG}"
MINOR=`${XML_CONFIG} --version | ${SED} -e 's/^2\.\([[0-9]]\{1,\}\).*/\1/'`
PATCH=`${XML_CONFIG} --version | ${SED} -e 's/^2\.[[0-9]]\{1,\}\.\([[0-9]]\{1,\}\)$/\1/'`
echo "Minor $MINOR, Patch $PATCH for `$XML_CONFIG --version`"
if test $MINOR -lt 6 ; then
echo ""
echo "**** You should use a recent version of libxml2, i.e. 2.6.22 or higher ****"
echo ""
exit 1
fi
if test "$MINOR" -eq 6 -a "$PATCH" -lt 3 ; then
echo ""
echo "**** There are problems compiling this package with libxml2-2.6.1 or libmxml2-2.6.2. ****"
echo "**** You will probably encounter compilation errors, so we are terminating the build. ****"
echo ""
exit 1
fi
fi
LIBXML_INCDIR=`${XML_CONFIG} --cflags`
LIBXML_LIBDIR=`${XML_CONFIG} --libs`
FOUND_LIBXML_INCLUDES="Ok"
fi
fi dnl USE_XML2
dnl If the user has specified LIBXML_INCDIR, then
dnl we use that.
dnl Otherwise, we try to find the parser.h file.
if test -n "${LIBXML_INCDIR}" && test -z "${XML_CONFIG}" ; then
echo "Checking directory of LIBXML_INCDIR"
if test -d $LIBXML_INCDIR ; then
dnl Maybe also test for ${LIBXML_INCDIR}/parser.h
dnl in case somebody points us directly at the include directory.
if test -r ${LIBXML_INCDIR}/libxml/parser.h ; then
FOUND_LIBXML_INCLUDES="Ok"
elif test -r ${LIBXML_INCDIR}/gnome-xml/parser.h ; then
FOUND_LIBXML_INCLUDES="Ok"
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DFROM_GNOME_XML_DIR=1"
else
echo "You specified LIBXML_INCDIR, but we couldn't find parser.h"
echo "Please specify it correctly and re-run the INSTALL'ation."
exit 1
fi
else
echo "The LIBXML_INCDIR value you specified ($LIBXML_INCDIR) is not a directory."
echo "Please specify it correctly and re-run the INSTALL'ation."
exit 1
fi
fi
dnl We should have exited if we cannot find parser.h
dnl LIBXML_INCDIR.
if test -z "${FOUND_LIBXML_INCLUDES}" ; then
dnl the idea is that we loop over different directories
dnl looking for parser.h. We look in the sub-directory
dnl gnome-xml/
TMP_CPPFLAGS=${CPPFLAGS}
for dir in ${LIBXML_INCDIR} /usr/local/include /usr/include ; do
CPPFLAGS="${TMP_CPPFLAGS} -I${dir}"
AC_CHECK_HEADER(libxml/parser.h, FROM_LIBXML_DIR=1)
if test -n "${FROM_LIBXML_DIR}" ; then
LIBXML_INCDIR="-I${dir}"
CPPFLAGS="${TMP_CPPFLAGS} -I${dir} -I${dir}/libxml"
PKG_CPPFLAGS="${TMP_CPPFLAGS} -I${dir} -I${dir}/libxml"
echo "Found the libxml parser.h in $dir/libxml/"
break
fi
CPPFLAGS="${TMP_CPPFLAGS} -I${dir}/gnome-xml"
AC_CHECK_HEADER(gnome-xml/parser.h, FROM_GNOME_XML_DIR=1)
if test -n "${FROM_GNOME_XML_DIR}" ; then
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DFROM_GNOME_XML_DIR=1"
CPPFLAGS="${CPPFLAGS} -DFROM_GNOME_XML_DIR=1"
LIBXML_INCDIR="-I${dir}"
echo "Found the gnome-xml parser in $dir"
break
fi
done
if test -z "${FROM_GNOME_XML_DIR}" ; then
CPPFLAGS=${TMP_CPPFLAGS}
fi
fi # end of -z FOUND_LIBXML_INCLUDES
if test -z "${LIBXML_INCDIR}"; then
AC_CHECK_HEADER(libxml/parser.h, LIBXML_INCDIR="libxml/")
fi
if test -z "${LIBXML_INCDIR}" ; then
echo "Cannot find parser.h. Set the value of the environment variable"
echo " LIBXML_INCDIR"
echo "to point to where it can be found."
exit 1;
else
echo "Located parser file ${LIBXML_INCDIR}/parser.h"
fi
dnl Do we need this? XXX
#LIBS="${LIBS} ${LIBXML_INCDIR}"
if test -z "${LIBXML2}" ; then
CPPFLAGS="${PKG_CPPFLAGS} ${LIBXML_INCDIR}"
echo "Checking for 1.8: ${CPPFLAGS}"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif]], [[ xmlAttr *attr;
attr->val = NULL;
]])],[echo "Using libxml 1.8.*!"],[LIBXML2="-DLIBXML2=1"; echo "Using libxml2.*" ])
fi
# AC_EGREP_HEADER(xmlParseFile, ${LIBXML_INCDIR}parser.h,
# HAVE_LIBXML_HEADER=true,
# AC_MSG_ERROR("header files for libxml seem to be incorrect"))
AC_CHECK_LIB(z, gzopen)
if test -n "${LIBXML2}" ; then
AC_CHECK_LIB(xml2, xmlParseFile, LIBS="${LIBS} -lxml2"; USE_XMLLIB_NAME=xml2, NO_XML_LIB=1, "${LIBXML_LIBDIR--L.}")
else
NO_XML_LIB=1
fi
if test -n "${NO_XML_LIB}" ; then
AC_CHECK_LIB(xml, xmlParseFile, LIBS="${LIBS} -lxml";USE_XMLLIB_NAME=xml, AC_MSG_ERROR("libxml not found"), "${LIBXML_LIBDIR--L.}")
fi
if test -n "${LIBXML_LIBDIR}" ; then
LIBS="${LIBXML_LIBDIR--L.} ${LIBS}"
LD_PATH="${LIBXML_LIBDIR-.}"
fi
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DLIBXML"
if test -z "${FROM_GNOME_XML_DIR}" ; then
PKG_CPPFLAGS="${PKG_CPPFLAGS} ${LIBXML_INCDIR--I.}"
fi
if test -z "${LIBXML2}" ; then
dnl Now we try to test whether we have a really old libxml
dnl which uses childs and root instead of xmlChildren and xmlRootNode
if test -z "${USE_OLD_ROOT_CHILD_NAMES}" ; then
CPPFLAGS=${PKG_CPPFLAGS}
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif]], [[ xmlDocPtr node;
node->xmlRootNode = NULL;
]])],[echo "New style libxml!"],[USE_OLD_ROOT_CHILD_NAMES=1; echo "Need to use old-style libxml names"])
echo "Using old root child names? ${USE_OLD_ROOT_CHILD_NAMES-0}"
fi # USE_OLD_ROOT_CHILD_NAMES
else # -z "${LIBXML2}"
CPPFLAGS=${PKG_CPPFLAGS}
if test -d "${LIBXML_LIBDIR}" ; then
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${LIBXML_LIBDIR}
export LD_LIBRARY_PATH
fi
AC_RUN_IFELSE([AC_LANG_SOURCE([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
int
main(int argc, char *argv[])
{
xmlCheckVersion(20000);
return(0);
}
]])],[ LIBXML2_OK=1],[LIBXML2_OK=0],[])
if test "${LIBXML2_OK}" = "0" ; then
echo "You are trying to use a version 2.* edition of libxml"
echo "but an incompatible library. The header files and library seem to be"
echo "mismatched. If you have specified LIBXML_INCDIR, make certain to also"
echo "specify an appropriate LIBXML_LIBDIR if the libxml2 library is not in the default"
echo "directories."
exit 1
fi
fi
if test -n "${USE_OLD_ROOT_CHILD_NAMES}" ; then
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DUSE_OLD_ROOT_CHILD_NAMES=1"
fi
fi
if test "${USE_XMLLIB_NAME}" = "xml2" ; then
AC_CHECK_LIB(xml2, xmlHashSize, echo "Using built-in xmlHashSize", PKG_CPPFLAGS="${PKG_CPPFLAGS} -DOWN_XML_HASH_SIZE=1")
fi
if test "${USE_LIBXML}" ; then
echo "Checking DTD parsing (presence of externalSubset)..."
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif]], [[
xmlParserCtxtPtr ctxt;
ctxt->inSubset = 0;
ctxt->sax->externalSubset = NULL;
]])],[USE_EXT_SUBSET=1],[])
if test -n "${USE_EXT_SUBSET}" ; then
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DUSE_EXTERNAL_SUBSET=1"
fi
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif]], [[
xmlNodePtr node;
int x;
x = node->type == XML_DTD_NODE;
]])],[ROOT_HAS_DTD_NODE=1],[echo "No XML_DTD_NODE defined"])
if test -n "${ROOT_HAS_DTD_NODE}" ; then
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DROOT_HAS_DTD_NODE=1"
fi
AC_CHECK_LIB(${USE_XMLLIB_NAME},xmlHashSize, echo "Found xmlHashSize", echo "No xmlHashSize")
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif]], [[
xmlEntityPtr ent;
ent->checked = 1;
]])],[ENTITY_HAS_CHECKED="yes"],[ENTITY_HAS_CHECKED="no"])
if test "${ENTITY_HAS_CHECKED}" = "no" ; then
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DNO_CHECKED_ENTITY_FIELD=1"
fi
fi
dnl Here we added the check of the flag to determine if the user wants to force
dnl the XML_OUTPUT_BUFFER code in XMLTree.c
AC_CHECK_LIB(${USE_XMLLIB_NAME}, xmlOutputBufferCreateBuffer, [echo "have xmlOutputBufferCreateBuffer()"; if test "${ADD_XML_OUTPUT_BUFFER}" = "yes" ; then PKG_CPPFLAGS="${PKG_CPPFLAGS} -DADD_XML_OUTPUT_BUFFER_CODE=1"; else ADD_XML_OUTPUT_BUFFER=no ; fi; ], [ echo "Using local xmlOutputBufferCreateBuffer. You might think about installing a newer version of libxml2, at least 2.6.23" ; PKG_CPPFLAGS="${PKG_CPPFLAGS} -DADD_XML_OUTPUT_BUFFER_CODE=1"; ADD_XML_OUTPUT_BUFFER=1])
AC_CHECK_LIB(${USE_XMLLIB_NAME}, xmlDocDumpFormatMemoryEnc, PKG_CPPFLAGS="${PKG_CPPFLAGS} -DDUMP_WITH_ENCODING=1")
if test -z "${FROM_GNOME_XML_DIR}" ; then
AC_CHECK_HEADER(libxml/xmlversion.h, PKG_CPPFLAGS="${PKG_CPPFLAGS} -DUSE_XML_VERSION_H=1")
fi
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif]], [[
xmlElementPtr el;
int x;
x = el->etype;
]])],[PKG_CPPFLAGS="${PKG_CPPFLAGS} -DXML_ELEMENT_ETYPE=1"
],[])
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif]], [[
xmlAttributePtr el;
int x;
x = el->atype;
]])],[PKG_CPPFLAGS="${PKG_CPPFLAGS} -DXML_ATTRIBUTE_ATYPE=1"
],[])
if test -n "${USE_EXPAT}" ; then
AC_CHECK_HEADER(xmltok/xmlparse.h, XMLPARSE_INCDIR="xmltok/")
if test -z "${XMLPARSE_INCDIR}" ; then
AC_CHECK_HEADER(xmlparse/xmlparse.h, XMLPARSE_INCDIR="xmlparse/")
fi
AC_EGREP_HEADER(XML_Parse, ${XMLPARSE_INCDIR}xmlparse.h,
HAVE_EXPAT_HEADER=true,
AC_MSG_ERROR("header file xmlparse.h seems to be incorrect"))
AC_CHECK_LIB(xmltok, XmlInitEncoding,,AC_MSG_ERROR("libxmltok not found"))
AC_CHECK_LIB(xmlparse, XML_Parse,,
AC_MSG_ERROR("libxmlparse not found"), -lxmltok)
PKG_CPPFLAGS="${PKG_CPPFLAGS} -DLIBEXPAT -I${XMLPARSE_INCDIR}"
LD_PATH="${LD_PATH}:${LIBXML_LIBDIR}"
fi
if test -n "${USE_EXPAT}" ; then
SUPPORTS_EXPAT="TRUE"
else
SUPPORTS_EXPAT="FALSE"
fi
echo "Expat: ${USE_EXPAT} ${SUPPORTS_EXPAT}"
if test -n "${USE_LIBXML}" ; then
SUPPORTS_LIBXML="TRUE"
else
SUPPORTS_LIBXML="FALSE"
fi
if test -z "${USE_SPLUS}" ; then
LANGUAGE_DEFS="-DUSE_R=1 -D_R_=1 ${LANGUAGE_DEFS}"
dnl PKG_SYS_FILE='system.file("scripts", name, pkg="RSPerl")'
else
dnl Test to see if the patch has been made to renamin the attribute()
dnl routine in libxml
if test ${SUPPORTS_LIBXML}="TRUE" ; then
AC_CHECK_LIB(xml, attribute, NEED_LIBXML_PATCH=1)
if test -n "${NEED_LIBXML_PATCH}" ; then
echo "The XML package will not work with S-Plus and the current libxml"
echo "because of a conflict from both having a routine named attribute()"
echo "We suggest that you modify the SAX.c file in the libxml and re-install."
echo "See PATCH.attribute in this package's distribution."
exit 1
fi
fi # ? SUPPORTS_LIBXML = "TRUE"
LANGUAGE_DEFS="-D_S_=1 -DUSE_S=1 -D_S4_=1 -D_SPLUS${SPLUS_VERSION}_ -DNO_SPLUS_THREAD_DEF=1 ${LANGUAGE_DEFS}"
INSTALL_DIR=`pwd`
PKG_SYS_FILE="paste(\"${INSTALL_DIR}/inst/scripts/\", name,sep=\"\")"
fi dnl end of if S-Plus.
AC_LINK_IFELSE([AC_LANG_PROGRAM([[
#include "parser.h"
]], [[
extern int xmlSkipBlankChars(xmlParserCtxtPtr ctxt);
xmlParserCtxtPtr p;
xmlSkipBlankChars(p);
]])],[echo "No need for old SKIP_BLANKS definition"
],[BLANKS_DEF="-DOLD_SKIP_BLANKS=1"
])
if test -n "LIBXML2" ; then
TMP_CFLAGS="${CFLAGS}"
CFLAGS="${CFLAGS} -pedantic-errors"
echo "Checking for return type of xmlHashScan element routine."
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#include
#else
#include
#include
#endif
]], [[
void *(*foo)(void *, void *, xmlChar*);
xmlElementTablePtr table;
xmlHashScan(table, foo, NULL);
]])],[echo "xmlHashScan wants a return value."],[echo "No return value for xmlHashScan"; PKG_CPPFLAGS="${PKG_CPPFLAGS} -DNO_XML_HASH_SCANNER_RETURN=1"
])
CFLAGS="${TMP_CFLAGS}"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#include
]], [[
xmlNsPtr ns;
ns->context;
]])],[PKG_CPPFLAGS="$PKG_CPPFLAGS -DLIBXML_NAMESPACE_HAS_CONTEXT=1"; echo "xmlNs has a context field"],[echo "No context field in xmlNs structure."])
fi
CPPFLAGS="$CPPFLAGS -I$R_HOME/include"
echo "Checking for cetype_t enumeration"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#include
]], [[
cetype_t t;
]])],[PKG_CPPFLAGS="${PKG_CPPFLAGS} -DHAVE_R_CETYPE_T=1"; echo "Using recent version of R with cetype_t enumeration type for encoding"],[echo "No cetype_t enumeration defined in R headers."])
AC_SUBST(DUMP_WITH_ENCODING)
PKG_LIBS=${LIBS}
dnl ----------------------------------------------------
dnl Add -m to the linker flags (actually libraries)
dnl to supress errors about multiple definitions of
dnl symbols.
dnl
dnl This is not needed anymore as we no longer define
dnl xmlParserError in our code and override the one
dnl in libmxml. If we do find a situation (e.g. version
dnl of libxml that doesn't allow us to play with routine
dnl pointers to do this, we will need to define
dnl NEED_XML_PARSER_ERROR
dnl
dnl AC_CANONICAL_HOST
dnl Doesn't work on my machine. Where do we find it - R?
if test -n "${NEED_XML_PARSER_ERROR}" ; then
AC_PATH_PROG(UNAME, uname)
if test -n "${UNAME}" ; then
host_os=`${UNAME}`
if test "${host_os}" = "Darwin" ; then
PKG_LIBS="-m $PKG_LIBS"
fi
fi
fi
dnl ---------------------------------------------------
AC_PATH_PROGS(XMLSEC_CONFIG, xmlsec1-config)
if test -n "$XMLSEC_CONFIG" ; then
PKG_CPPFLAGS="$PKG_CPPFLAGS `$XMLSEC_CONFIG --cflags`"
PKG_LIBS="$PKG_LIBS `$XMLSEC_CONFIG --libs`"
XMLSEC_DEFS=-DHAVE_LIBXML_SEC=1
fi
AC_ARG_ENABLE(nodegc, [ --enable-nodegc enable node garbage collection],
[ if test "${enableval}" = "yes" || test "${enableval}" = "default" ; then
LANGUAGE_DEFS="${LANGUAGE_DEFS} -DXML_REF_COUNT_NODES=1"
fi;
echo "enabling nodegc? ${enableval}"],
[echo "nodegc default $enableval"; LANGUAGE_DEFS="${LANGUAGE_DEFS} -DXML_REF_COUNT_NODES=1"])
AC_ARG_ENABLE(xml-debug, [ --enable-xml-debug enable debugging information, primarily for memory management],
[ if test "${enableval}" = "yes" || test "${enableval}" = "default" ; then
LANGUAGE_DEFS="${LANGUAGE_DEFS} -DR_XML_DEBUG=1"
fi;
echo "enabling xml-debug? ${enableval}"],
[echo "xml-debug default $enableval"; LANGUAGE_DEFS="${LANGUAGE_DEFS}"])
dnl ---------------------------------------------------
AC_DEFUN([CHECK_ENUM], [
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
]], [[
int val;
val = $1;
]])],[PKG_CPPFLAGS="$PKG_CPPFLAGS -DHAVE_$1=1"; echo "Version has $1"],[echo "No $1 enumeration value."])
])
CHECK_ENUM(XML_WITH_ZLIB)
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
#ifdef FROM_GNOME_XML_DIR
#include
#else
#include
#endif
]], [[
xmlFeature f;
xmlHasFeature(f);
]])],[PKG_CPPFLAGS="$PKG_CPPFLAGS -DHAVE_XML_HAS_FEATURE=1"; echo "Version has xmlHasFeature()"],[echo "No xmlHasFeature."])
dnl ---------------------------------------------
AC_SUBST(XMLSEC_DEFS)
AC_SUBST(LIBXML2)
AC_SUBST(LANGUAGE_DEFS)
AC_SUBST(LIBXML_INCDIR)
AC_SUBST(XMLPARSE_INCDIR)
AC_SUBST(PKG_LIBS)
AC_SUBST(PKG_CPPFLAGS)
AC_SUBST(SUPPORTS_LIBXML)
AC_SUBST(SUPPORTS_EXPAT)
AC_SUBST(LD_PATH)
AC_SUBST(EXPORT_MEMORY_MANAGEMENT)
if test -n "${USE_SPLUS}" ; then
AC_SUBST(SPLUS)
SPLUS_MAKEFILE=GNUmakefile.Splus
fi
echo ""
echo "****************************************"
echo "Configuration information:"
echo ""
echo "Libxml settings"
echo ""
echo "libxml include directory: ${LIBXML_INCDIR}"
echo "libxml library directory: ${LIBS}"
echo "libxml 2: ${LIBXML2-no}"
echo ""
echo "Compilation flags: ${PKG_CPPFLAGS} ${LANGUAGE_DEFS} $XMLSEC_DEFS"
echo "Link flags: ${PKG_LIBS}"
echo ""
echo "****************************************"
if test "$ADD_XML_OUTPUT_BUFFER" = "no" ; then
ADD_XML_OUTPUT_BUFFER=0
fi
if test "$ADD_XML_OUTPUT_BUFFER" = "yes" ; then
ADD_XML_OUTPUT_BUFFER=1
fi
AC_SUBST(ADD_XML_OUTPUT_BUFFER)
if test -n "${_R_CHECK_TIMINGS_}" ; then
PKG_CPPFLAGS="$PKG_CPPFLAGS -DNO_XML_MEMORY_SHOW_ROUTINE=1"
fi
dnl create the different targets
dnl We had NAMESPACE here when were conditionally exporting the functions to
dnl remove finalizers on a node or document. Need to add NAMESPACE.in in
dnl Install/GNUmakefile.admin
AC_CONFIG_FILES([src/Makevars R/supports.R inst/scripts/RSXML.csh inst/scripts/RSXML.bsh ${SPLUS_MAKEFILE}])
AC_OUTPUT
chmod +x cleanup
dnl Set things up for an S-Plus chapter.
if test -n "${USE_SPLUS}" ; then
mv GNUmakefile.Splus GNUmakefile
echo "Creating S-Plus chapter"
cd src
C_SRC_FILES="DocParse.c RSDTD.c Utils.c"
${SPLUS} CHAPTER ${C_SRC_FILES}
echo "include Makevars" >> makefile
echo 'CFLAGS:= $(PKG_CPPFLAGS) $(CFLAGS)' >> makefile
echo 'LOCAL_LIBS=$(PKG_LIBS) ' >> makefile
cd ..
fi
XML/inst/ 0000755 0001751 0000144 00000000000 13607643014 011711 5 ustar hornik users XML/inst/examples/ 0000755 0001751 0000144 00000000000 14636531033 013527 5 ustar hornik users XML/inst/examples/README 0000644 0001751 0000144 00000000246 13607633730 014415 0 ustar hornik users propmptXML - an R function analogous to prompt for generating
help/documentation files for R functions, but this generates this in
XML format according to Rhelp.dtd.
XML/inst/examples/gettingStarted.xml 0000644 0001751 0000144 00000017713 13607633725 017262 0 ustar hornik users
The idea here is to provide simple examples of how to get started
with processing XML in R using some reasonably straightforward "flat" XML files
and not worrying about efficiency.
An Example: Grades
Here is an example of a simple file in XML containing
grades for students for three different tests.
We might want to turn this into a data frame in R
with a row for each student and four variables,
the name and the scores on the three tests.
Since this is a small file, let's not worry about efficiency in any way.
We can read the entire document tree into memory and make multiple passes
over it to get the information.
Our first approach will be to read the XML into an R tree, i.e.
R-level XML node objects.
We do this with a simple call to xmlTreeParse.
doc = xmlRoot(xmlTreeParse("generic_file.xml"))
We use xmlRoot to get the top-level node of the tree
rather than holding onto the general document information since we won't need it.
Since the structure of this file is
just a list of elements under the root node,
we need only process each of those nodes and turn them into something we want.
The "easiest" way to apply the same function to each child of an XML node
is with the xmlApply function.
What do we want to do for each of the <GRADES> node?
We want to get the value, i.e. the simple text within the node, of each of its children.
Since this is the same for each of the child nodes in <GRADES>, this is again
another call to xmlApply. And since this is all text, we can
simplify the result and get back a character vector rather than a list by using
xmlSApply which will perform this extra simplication step.
So a function to do the initial processing of an individual
<GRADES> node might be
function(node)
xmlSApply(node, xmlValue)
since xmlValue returns the text content within an XML node.
Let's check that this does what we want by calling it on the
first child of the root node.
xmlSApply(doc[[1]], xmlValue)
And indeed it does.
So we can process all the <GRADES> nodes with the command
tmp = xmlSApply(doc, function(x) xmlSApply(x, xmlValue))
The result is a character matrix in which the rows are the variables and the columns
are the records. So let's transpose this.
tmp = t(tmp)
Now, we have finished working with the XML; the rest is regular R programming.
grades = as.data.frame(matrix(as.numeric(tmp[,-1]), 2))
names(grades) = names(doc[[1]])[-1]
grades$Student = tmp[,1]
There seems to be more messing about after we have got the values out of the XML file.
There are several things that might seem more complex but that actually just move
the work to different places, i.e. when we are traversing the XML tree.
Here's another alternative using XPath.
doc = xmlTreeParse("generic_file.xml", useInternal = TRUE)
ans = lapply(c("STUDENT", "TEST1", "TEST2", "FINAL"),
function(var)
unlist(xpathApply(doc, paste("//", var, sep = ""), xmlValue)))
And this gives us a list containing the variables
with the values as character vectors.
as.data.frame(lapply(names(ans),
function(x) if(x != "STUDENT") as.integer(x) else x ))
Another Example: Customer Information List
The second example is another list, this time of description of customers.
The first two nodes in the document are shown below:
ALFKI
Alfreds Futterkiste
Maria Anders
Sales Representative
Obere Str. 57
Berlin
12209
Germany
030-0074321
030-0076545
ANATR
Ana Trujillo Emparedados y helados
Ana Trujillo
Owner
Avda. de la Constitución 2222
México D.F.
05021
Mexico
(5) 555-4729
(5) 555-3745
]]>
We can quickly verify that all the nodes under the root are customers
with the command
doc = xmlRoot(xmlTreeParse("Cust-List.xml"))
table(names(doc))
We see that these are all "Customers".
We could further explore to see if each of these nodes has the same fields.
fields = xmlApply(doc, names)
table(sapply(fields, identical, fields[[1]]))
And the result indicates that about half of them are the same.
Let's see how many unique field names there are:
unique(unlist(fields))
This gives 10. And
we can see how may fields are in each of the Customers nodes with
xmlSApply(doc, xmlSize)
So most of the nodes have most of the fields.
So let's think about a data frame.
What we can do is treat each of the fields as having a simple string value.
Then we can create a data frame with the 10 character columns and with NA values for each
of the records. Thne we will fill this in record at a time.
ans = as.data.frame(replicate(10, character(xmlSize(doc))),
stringsAsFactors = FALSE)
names(ans) = unique(unlist(fields))
Now that we have the skeleton of the answer, we can process each of the
Customers nodes.
Note that we used a global assignemnt in the function to change the
ans in the global environment rather than the local version within the
function call.
Also, we loop over the indices of the
nodes in the tree, i.e. use sapply(1:xmlSize(doc), )
rather than xmlSApply(doc, )
simply because we need to know which row to put the results for each node.
There are various other ways to process these two XML files. One is
to use handler functions to process the internal nodes as they are
being converted from C-level data structures to R objects in a call to
xmlTreeParse. This avoids multiple traversal of the
tree but can seem a little indirect until you get the hang of it. And
some transformations can be cumbersome using this approach as it is a
bottom up transformation.
The event-driven parsing provided by xmlEventParse
is a SAX style approach. This is quite low level and used when
reading the entire XML document into memory and then processing it is prohibitive,
i.e. when the XML file is very, very large.
The use of XPath to perform queries and get subsets of nodes involves
a) learning XPath and b) potentially multiple passes over the tree.
If one has to do many queries, this can be slow overall eventhough each
is very fast. However, if you know XPath or are happy to learn the basics,
this can be quite convenient, avoiding having to write recursive functions
to search for the nodes of interests.
Using the internal nodes (as you must for XPath) also gives you the ability
to go up the tree, i.e. find parent, ancestor and sibling nodes, and not just
down to children. So we have more flexibility in how we traverse the tree.