nokogiri-1.6.1/0000755000175000017500000000000012261213762012720 5ustar boutilboutilnokogiri-1.6.1/test_all0000755000175000017500000000411112261213762014452 0ustar boutilboutil#! /usr/bin/env bash # # script to run tests on all relevant rubies, and valgrind on supported rubies. # outputs tests to `test.log` and valgrind output to `valgrind.log`. # # requires `rvm` to be installed. sorry about that, multiruby dudes. # # it's worth periodically using hoe-debugger's ability to generate # valgrind suppression files to remove spurious valgrind messages # (e.g., 1.9.3's glob_helper). ["rake test:valgrind:suppression"] # RUBIES="ruby-1.9.3-p327 jruby-1.7.3 jruby-1.6.5.1 jruby-1.6.7.2 ruby-1.9.2-p320" TEST_LOG=test.log VALGRIND_LOG=valgrind.log # Load RVM into a shell session *as a function* if [[ -s "$HOME/.rvm/scripts/rvm" ]] ; then source "$HOME/.rvm/scripts/rvm" elif [[ -s "/usr/local/rvm/scripts/rvm" ]] ; then source "/usr/local/rvm/scripts/rvm" else echo "ERROR: An RVM installation was not found.\n" fi > $TEST_LOG > $VALGRIND_LOG set -o errexit function rvm_use { current_ruby=$1 rvm use "${1}@nokogiri" --create || rvm -v } function generate_parser_and_tokenizer { old_ruby=$current_ruby rvm_use ruby-1.9.3-p327 bundle exec rake generate 2>&1 > /dev/null rvm_use $old_ruby } function clean { bundle exec rake clean clobber 2>&1 > /dev/null } function compile { echo "** compiling ..." # generate_parser_and_tokenizer bundle exec rake compile 2>&1 > /dev/null } for ruby in $RUBIES ; do rvm_use ${ruby} if ! which bundle ; then gem install bundler fi bundle install --quiet --local || bundle install clean done for ruby in $RUBIES ; do rvm_use ${ruby} echo -e "**\n** testing nokogiri on ${ruby}\n**" | tee -a $TEST_LOG clean compile echo "** running tests ..." bundle exec rake test 2>&1 | tee -a $TEST_LOG clean done for ruby in $RUBIES ; do if [[ ! $ruby =~ "jruby" ]] ; then rvm_use ${ruby} echo -e "**\n** nokogiri prerelease: ${ruby}\n**" | tee -a $VALGRIND_LOG clean compile echo "** running valgrind on tests ..." bundle exec rake test:valgrind 2>&1 | tee -a $VALGRIND_LOG clean fi done nokogiri-1.6.1/CHANGELOG.rdoc0000644000175000017500000007476312261213762015101 0ustar boutilboutil=== 1.6.1 / 2013-12-14 * Bugfixes * (JRuby) Fix out of memory bug when certain invalid documents are parsed. * (JRuby) Fix regression of billion-laughs vulnerability. #586 === 1.6.0 / 2013-06-08 This release was based on v1.5.10 and 1.6.0.rc1, and contains changes mentioned in both. * Deprecations * Remove pre 1.9 monitoring from Travis. === 1.6.0.rc1 / 2013-04-14 This release was based on v1.5.9, and so does not contain any fixes mentioned in the notes for v1.5.10. * Notes * mini_portile is now a runtime dependency * Ruby 1.9.2 and higher now required * Features * (MRI) Source code for libxml 2.8.0 and libxslt 1.2.26 is packaged with the gem. These libraries are compiled at gem install time unless the environment variable NOKOGIRI_USE_SYSTEM_LIBRARIES is set. VERSION_INFO (also `nokogiri -v`) exposes whether libxml was compiled from packaged source, or the system library was used. * (Windows) libxml upgraded to 2.8.0 * Deprecations * Support for Ruby 1.8.7 and prior has been dropped === 1.5.11 / 2013-11-09 * Bugfixes * (JRuby) Fix out of memory bug when certain invalid documents are parsed. * (JRuby) Fix regression of billion-laughs vulnerability. #568 === 1.5.10 / 2013-06-07 * Bugfixes * (JRuby) Fix "null document" error when parsing an empty IO in jruby 1.7.3. #883 * (JRuby) Fix schema validation when XSD has DOCTYPE set to DTD. #861 (Thanks, Patrick Cheng!) * (MRI) Fix segfault when there is no default subelement for an HTML node. #917 * Notes * Use rb_ary_entry instead of RARRAY_PTR (you know, for Rubinius). #877 (Thanks, Dirkjan Bussink!) * Fix TypeError when running tests. #900 (Thanks, Cédric Boutillier!) === 1.5.9 / 2013-03-21 * Bugfixes * Ensure that prefixed attributes are properly namespaced when reparented. #869 * Fix for inconsistent namespaced attribute access for SVG nested in HTML. #861 * (MRI) Fixed a memory leak in fragment parsing if nodes are not all subsequently reparented. #856 === 1.5.8 / 2013-03-19 * Bugfixes * (JRuby) Fix EmptyStackException thrown by elements with xlink:href attributes and no base_uri #534, #805. (Thanks, Patrick Quinn and Brian Hoffman!) * Fixes duplicate attributes issue introduced in 1.5.7. #865 * Allow use of a prefixed namespace on a root node using Nokogiri::XML::Builder #868 === 1.5.7 / 2013-03-18 * Features * Windows support for Ruby 2.0. * Bugfixes * SAX::Parser.parse_io throw an error when used with lower case encoding. #828 * (JRuby) Java Nokogiri is finally green (passes all tests) under 1.8 and 1.9 mode. High five everyone. #798, #705 * (JRuby) Nokogiri::XML::Reader broken (as a pull parser) on jruby - reads the whole XML document. #831 * (JRuby) JRuby hangs parsing "&". #837 * (JRuby) JRuby NPE parsing an invalid XML instruction. #838 * (JRuby) Node#content= incompatibility. #839 * (JRuby) to_xhtml doesn't print the last slash for self-closing tags in JRuby. #834 * (JRuby) Adding an EntityReference after a Text node mangles the entity in JRuby. #835 * (JRuby) JRuby version inconsistency: nil for empty attributes. #818 * CSS queries for classes (e.g., ".foo") now treat all whitespace identically. #854 * Namespace behavior cleaned up and made consistent between JRuby and MRI. #846, #801 (Thanks, Michael Klein!) * (MRI) SAX parser handles empty processing instructions. #845 === 1.5.6 / 2012-12-19 * Features * Improved performance of XML::Document#collect_namespaces. #761 (Thanks, Juergen Mangler!) * New callback SAX::Document#processing_instruction (Thanks, Kitaiti Makoto!) * Node#native_content= allows setting unescaped node contant. #768 * XPath lookup with namespaces supports symbol keys. #729 (Thanks, Ben Langfeld.) * XML::Node#[]= stringifies values. #729 (Thanks, Ben Langfeld.) * bin/nokogiri will process a document from $stdin * bin/nokogiri -e will execute a program from the command line * (JRuby) bin/nokogiri --version will print the Xerces and NekoHTML versions. * Bugfixes * Nokogiri now detects XSLT transform errors. #731 (Thanks, Justin Fitzsimmons!) * Don't throw an Error when trying to replace top-level text node in DocumentFragment. #775 * Raise an ArgumentError if an invalid encoding is passed to the SAX parser. #756 (Thanks, Bradley Schaefer!) * Prefixed element inconsistency between CRuby and JRuby. #712 * (JRuby) space prior to xml preamble causes nokogiri to fail parsing. (fixed along with #748) #790 * (JRuby) Fixed the bug Nokogiri::XML::Node#content inconsistency between Java and C. #794, #797 * (JRuby) raises INVALID_CHARACTER_ERR exception when EntityReference name starts with '#'. #719 * (JRuby) doesn't coerce namespaces out of strings on a direct subclass of Node. #715 * (JRuby) Node#content now renders newlines properly. #737 (Thanks, Piotr Szmielew!) * (JRuby) Unknown namespace are ignore when the recover option is used. #748 * (JRuby) XPath queries for namespaces should not throw exceptions when called twice in a row. #764 * (JRuby) More consistent (with libxml2) whitespace formatting when emitting XML. #771 * (JRuby) namespaced attributes broken when appending raw xml to builder. #770 * (JRuby) Nokogiri::XML::Document#wrap raises undefined method `length' for nil:NilClass when trying to << to a node. #781 * (JRuby) Fixed "bad file descriptor" bug when closing open file descriptors. #495 * (JRuby) JRuby/CRuby incompatibility for attribute decorators. #785 * (JRuby) Issues parsing valid XML with no internal subset in the DTD. #547, #811 * (JRuby) Issues parsing valid node content when it contains colons. #728 * (JRuby) Correctly parse the doc type of html documents. #733 * (JRuby) Include dtd in the xml output when a builder is used with create_internal_subset. #751 * (JRuby) builder requires textwrappers for valid utf8 in jruby, not in mri. #784 === 1.5.5 / 2012-06-24 * Features * Much-improved support for JRuby in 1.9 mode! Yay! * Bugfixes * Regression in JRuby Nokogiri add_previous_sibling (1.5.0 -> 1.5.1) #691 (Thanks, John Shahid!) * JRuby unable to create HTML doc if URL arg provided #674 (Thanks, John Shahid!) * JRuby raises NullPointerException when given HTML document is nil or empty string. #699 * JRuby 1.9 error, uncaught throw 'encoding_found', has been fixed. #673 * Invalid encoding returned in JRuby with US-ASCII. #583 * XmlSaxPushParser raises IndexOutOfBoundsException when over 512 characters are given. #567, #615 * When xpath evaluation returns empty NodeSet, decorating NodeSet's base document raises exception. #514 * JRuby raises exception when xpath with namespace is specified. pull request #681 (Thanks, Piotr Szmielew) * JRuby renders nodes without their namespace when subclassing Node. #695 * JRuby raises NAMESPACE_ERR (org.w3c.dom.DOMException) while instantiating RDF::RDFXML::Writer. #683 * JRuby is not able to use namespaces in xpath. #493 * JRuby's Entity resolving should be consistent with C-Nokogiri #704, #647, #703 === 1.5.4 / 2012-06-12 * Features * The "nokogiri" script now has more verbose output when passed the `--rng` option. #675 (Thanks, Dan Radez!) * Build support on hardened Debian systems that use `-Werror=format-security`. #680. * Better build support for systems with pkg-config. #584 * Better build support for systems with multiple iconv installations. * Bugfixes * Segmentation fault when creating a comment node for a DocumentFragment. #677, #678. * Treat '.' as xpath in at() and search(). #690 * (MRI, Security) Default parse options for XML documents were changed to not make network connections during document parsing, to avoid XXE vulnerability. #693 To re-enable this behavior, the configuration method `nononet` may be called, like this: Nokogiri::XML::Document.parse(xml) { |config| config.nononet } Insert your own joke about double-negatives here. === 1.5.3 / 2012-06-01 * Features * Support for "prefixless" CSS selectors ~, > and + like jQuery supports. #621, #623. (Thanks, David Lee!) * Attempting to improve installation on homebrew 0.9 (with regards to iconv). Isn't package management convenient? * Bugfixes * Custom xpath functions with empty nodeset arguments cause a segfault. #634. * Nokogiri::XML::Node#css now works for XML documents with default namespaces when the rule contains attribute selector without namespace. * Fixed marshalling bugs around how arguments are passed to (and returned from) XSLT custom xpath functions. #640. * Nokogiri::XML::Reader#outer_xml is broken in JRuby #617 * Nokogiri::XML::Attribute on JRuby returns a nil namespace #647 * Nokogiri::XML::Node#namespace= cannot set a namespace without a prefix on JRuby #648 * (JRuby) 1.9 mode causes dead lock while running rake #571 * HTML::Document#meta_encoding does not raise exception on docs with malformed content-type. #655 * Fixing segfault related to unsupported encodings in in-context parsing on 1.8.7. #643 * (JRuby) Concurrency issue in XPath parsing. #682 === 1.5.2 / 2012-03-09 Repackaging of 1.5.1 with a gemspec that is compatible with older Rubies. #631, #632. === 1.5.1 / 2012-03-09 * Features * XML::Builder#comment allows creation of comment nodes. * CSS searches now support namespaced attributes. #593 * Java integration feature is added. Now, XML::Document.wrap and XML::Document#to_java methods are available. * RelaxNG validator support in the `nokogiri` cli utility. #591 (thanks, Dan Radez!) * Bugfixes * Fix many memory leaks and segfault opportunities. Thanks, Tim Elliott! * extconf searches homebrew paths if homebrew is installed. * Inconsistent behavior of Nokogiri 1.5.0 Java #620 * Inheriting from Nokogiri::XML::Node on JRuby (1.6.4/5) fails #560 * XML::Attr nodes are not allowed to be added as node children, so an exception is raised. #558 * No longer defensively "pickle" adjacent text nodes on Node#add_next_sibling and Node#add_previous_sibling calls. #595. * Java version inconsistency: it returns nil for empty attributes #589 * to_xhtml incorrectly generates

when tag is empty #557 * Document#add_child now accepts a Node, NodeSet, DocumentFragment, or String. #546. * Document#create_element now recognizes namespaces containing non-word characters (like "SOAP-ENV"). This is mostly relevant to users of Builder, which calls Document#create_element for nearly everything. #531. * File encoding broken in 1.5.0 / jruby / windows #529 * Java version does not return namespace defs as attrs for ::HTML #542 * Bad file descriptor with Nokogiri 1.5.0 #495 * remove_namespace! doesn't work in pure java version #492 * The Nokogiri Java native build throws a null pointer exception when ActiveSupport's .blank? method is called directly on a parsed object. #489 * 1.5.0 Not using correct character encoding #488 * Raw XML string in XML Builder broken on JRuby #486 * Nokogiri 1.5.0 XML generation broken on JRuby #484 * Do not allow multiple root nodes. #550 * Fixes for custom XPath functions. #605, #606 (thanks, Juan Wajnerman!) * Node#to_xml does not override :save_with if it is provided. #505 * Node#set is a private method (JRuby). #564 (thanks, Nick Sieger!) * C14n cleanup and Node#canonicalize (thanks, Ivan Pirlik!) #563 === 1.5.0 / 2011-07-01 * Notes * See changelog from 1.4.7 * Features * extracted sets of Node::SaveOptions into Node::SaveOptions::DEFAULT_{X,H,XH}TML (refactor) * Bugfixes * default output of XML on JRuby is no longer formatted due to inconsistent whitespace handling. #415 * (JRuby) making empty NodeSets with null `nodes` member safe to operate on. #443 * Fix a bug in advanced encoding detection that leads to partially duplicated document when parsing an HTML file with unknown encoding. * Add support for . === 1.5.0 beta3 / 2010/12/02 * Notes * JRuby performance tuning * See changelog from 1.4.4 * Bugfixes * Node#inner_text no longer returns nil. (JRuby) #264 === 1.5.0 beta2 / 2010/07/30 * Notes * See changelog from 1.4.3 === 1.5.0 beta1 / 2010/05/22 * Notes * JRuby support is provided by a new pure-java backend. * Deprecations * Ruby 1.8.6 is deprecated. Nokogiri will install, but official support is ended. * LibXML 2.6.16 and earlier are deprecated. Nokogiri will refuse to install. * FFI support is removed. === 1.4.7 / 2011-07-01 * Bugfixes * Fix a bug in advanced encoding detection that leads to partially duplicated document when parsing an HTML file with unknown encoding. Thanks, Timothy Elliott (@ender672)! #478 === 1.4.6 / 2011-06-19 * Notes * This version is functionally identical to 1.4.5. * Ruby 1.8.6 support has been restored. === 1.4.5 / 2011-05-19 * New Features * Nokogiri::HTML::Document#title accessor gets and sets the document title. * extracted sets of Node::SaveOptions into Node::SaveOptions::DEFAULT_{X,H,XH}TML (refactor) * Raise an exception if a string is passed to Nokogiri::XML::Schema#validate. #406 * Bugfixes * Node#serialize-and-friends now accepts a SaveOption object as the, erm, save object. * Nokogiri::CSS::Parser has-a Nokogiri::CSS::Tokenizer * (JRUBY+FFI only) Weak references are now threadsafe. #355 * Make direct start_element() callback (currently used for HTML::SAX::Parser) pass attributes in assoc array, just as emulated start_element() callback does. rel. #356 * HTML::SAX::Parser should call back a block given to parse*() if any, just as XML::SAX::Parser does. * Add further encoding detection to HTML parser that libxml2 does not do. * Document#remove_namespaces! now handles attributes with namespaces. #396 * XSLT::Stylesheet#transform no longer segfaults when handed a non-XML::Document. #452 * XML::Reader no longer segfaults when under GC pressure. #439 === 1.4.4 / 2010-11-15 * New Features * XML::Node#children= sets the node's inner html (much like #inner_html=), but returns the reparent node(s). * XSLT supports function extensions. #336 * XPath bind parameter substitution. #329 * XML::Reader node type constants. #369 * SAX Parser context provides line and column information * Bugfixes * XML::DTD#attributes returns an empty hash instead of nil when there are no attributes. * XML::DTD#{keys,each} now work as expected. #324 * {XML,HTML}::DocumentFragment.{new,parse} no longer strip leading and trailing whitespace. #319 * XML::Node#{add_child,add_previous_sibling,add_next_sibling,replace} return a NodeSet when passed a string. * Unclosed tags parsed more robustly in fragments. #315 * XML::Node#{replace,add_previous_sibling,add_next_sibling} edge cases fixed related to libxml's text node merging. #308 * Fixed a segfault when GC occurs during xpath handler argument marshalling. #345 * Added hack to Slop decorator to work with previously defined methods. #330 * Fix a memory leak when duplicating child nodes. #353 * Fixed off-by-one bug with nth-last-{child,of-type} CSS selectors when NOT using an+b notation. #354 * Fixed passing of non-namespace attributes to SAX::Document#start_element. #356 * Workaround for libxml2 in-context parsing bug. #362 * Fixed NodeSet#wrap on nodes within a fragment. #331 === 1.4.3 / 2010/07/28 * New Features * XML::Reader#empty_element? returns true for empty elements. #262 * Node#remove_namespaces! now removes namespace *declarations* as well. #294 * NodeSet#at_xpath, NodeSet#at_css and NodeSet#> do what the corresponding methods of Node do. * Bugfixes * XML::NodeSet#{include?,delete,push} accept an XML::Namespace * XML::Document#parse added for parsing in the context of a document * XML::DocumentFragment#inner_html= works with contextual parsing! #298, #281 * lib/nokogiri/css/parser.y Combined CSS functions + pseudo selectors fixed * Reparenting text nodes is safe, even when the operation frees adjacent merged nodes. #283 * Fixed libxml2 versionitis issue with xmlFirstElementChild et al. #303 * XML::Attr#add_namespace now works as expected. #252 * HTML::DocumentFragment uses the string's encoding. #305 * Fix the CSS3 selector translation rule for the general sibling combinator (a.k.a. preceding selector) that incorrectly converted "E ~ F G" to "//F//G[preceding-sibling::E]". === 1.4.2 / 2010/05/22 * New Features * XML::Node#parse will parse XML or HTML fragments with respect to the context node. * XML::Node#namespaces returns all namespaces defined in the node and all ancestor nodes (previously did not return ancestors' namespace definitions). * Added Enumerable to XML::Node * Nokogiri::XML::Schema#validate now uses xmlSchemaValidateFile if a filename is passed, which is faster and more memory-efficient. GH #219 * XML::Document#create_entity will create new EntityDecl objects. GH #174 * JRuby FFI implementation no longer uses ObjectSpace._id2ref, instead using Charles Nutter's rocking Weakling gem. * Nokogiri::XML::Node#first_element_child fetch the first child node that is an ELEMENT node. * Nokogiri::XML::Node#last_element_child fetch the last child node that is an ELEMENT node. * Nokogiri::XML::Node#elements fetch all children nodes that are ELEMENT nodes. * Nokogiri::XML::Node#add_child, #add_previous_sibling, #before, #add_next_sibling, #after, #inner_html, #swap and #replace all now accept a Node, DocumentFragment, NodeSet, or a string containing markup. * Node#fragment? indicates whether a node is a DocumentFragment. * Bugfixes * XML::NodeSet is now always decorated (if the document has decorators). GH #198 * XML::NodeSet#slice gracefully handles offset+length larger than the set length. GH #200 * XML::Node#content= safely unlinks previous content. GH #203 * XML::Node#namespace= takes nil as a parameter * XML::Node#xpath returns things other than NodeSet objects. GH #208 * XSLT::StyleSheet#transform accepts hashes for parameters. GH #223 * Psuedo selectors inside not() work. GH #205 * XML::Builder doesn't break when nodes are unlinked. Thanks to vihai! GH #228 * Encoding can be forced on the SAX parser. Thanks Eugene Pimenov! GH #204 * XML::DocumentFragment uses XML::Node#parse to determine children. * Fixed a memory leak in xml reader. Thanks sdor! GH #244 * Node#replace returns the new child node as claimed in the RDoc. Previously returned +self+. * Notes * The Windows gems now bundle DLLs for libxml 2.7.6 and libxslt 1.1.26. Prior to this release, libxml 2.7.3 and libxslt 1.1.24 were bundled. === 1.4.1 / 2009/12/10 * New Features * Added Nokogiri::LIBXML_ICONV_ENABLED * Alias Node#[] to Node#attr * XML::Node#next_element added * XML::Node#> added for searching a nodes immediate children * XML::NodeSet#reverse added * Added fragment support to Node#add_child, Node#add_next_sibling, Node#add_previous_sibling, and Node#replace. * XML::Node#previous_element implemented * Rubinius support * Ths CSS selector engine now supports :has() * XML::NodeSet#filter() was added * XML::Node.next= and .previous= are aliases for add_next_sibling and add_previous_sibling. GH #183 * Bugfixes * XML fragments with namespaces do not raise an exception (regression in 1.4.0) * Node#matches? works in nodes contained by a DocumentFragment. GH #158 * Document should not define add_namespace() method. GH #169 * XPath queries returning namespace declarations do not segfault. * Node#replace works with nodes from different documents. GH #162 * Adding XML::Document#collect_namespaces * Fixed bugs in the SOAP4R adapter * Fixed bug in XML::Node#next_element for certain edge cases * Fixed load path issue with JRuby under Windows. GH #160. * XSLT#apply_to will honor the "output method". Thanks richardlehane! * Fragments containing leading text nodes with newlines now parse properly. GH #178. === 1.4.0 / 2009/10/30 * Happy Birthday! * New Features * Node#at_xpath returns the first element of the NodeSet matching the XPath expression. * Node#at_css returns the first element of the NodeSet matching the CSS selector. * NodeSet#| for unions GH #119 (Thanks Serabe!) * NodeSet#inspect makes prettier output * Node#inspect implemented for more rubyish document inspecting * Added XML::DTD#external_id * Added XML::DTD#system_id * Added XML::ElementContent for DTD Element content validity * Better namespace declaration support in Nokogiri::XML::Builder * Added XML::Node#external_subset * Added XML::Node#create_external_subset * Added XML::Node#create_internal_subset * XML Builder can append raw strings (GH #141, patch from dudleyf) * XML::SAX::ParserContext added * XML::Document#remove_namespaces! for the namespace-impaired * Bugfixes * returns nil when HTML documents do not declare a meta encoding tag. GH #115 * Uses RbConfig::CONFIG['host_os'] to adjust ENV['PATH'] GH #113 * NodeSet#search is more efficient GH #119 (Thanks Serabe!) * NodeSet#xpath handles custom xpath functions * Fixing a SEGV when XML::Reader gets attributes for current node * Node#inner_html takes the same arguments as Node#to_html GH #117 * DocumentFragment#css delegates to it's child nodes GH #123 * NodeSet#[] works with slices larger than NodeSet#length GH #131 * Reparented nodes maintain their namespace. GH #134 * Fixed SEGV when adding an XML::Document to NodeSet * XML::SyntaxError can be duplicated. GH #148 * Deprecations * Hpricot compatibility layer removed === 1.3.3 / 2009/07/26 * New Features * NodeSet#children returns all children of all nodes * Bugfixes * Override libxml-ruby's global error handler * ParseOption#strict fixed * Fixed a segfault when sending an empty string to Node#inner_html= GH #88 * String encoding is now set to UTF-8 in Ruby 1.9 * Fixed a segfault when moving root nodes between documents. GH #91 * Fixed an O(n) penalty on node creation. GH #101 * Allowing XML documents to be output as HTML documents * Deprecations * Hpricot compatibility layer will be removed in 1.4.0 === 1.3.2 / 2009-06-22 * New Features * Nokogiri::XML::DTD#validate will validate your document * Bugfixes * Nokogiri::XML::NodeSet#search will search top level nodes. GH #73 * Removed namespace related methods from Nokogiri::XML::Document * Fixed a segfault when a namespace was added twice * Made nokogiri work with Snow Leopard GH #79 * Mailing list has moved to: http://groups.google.com/group/nokogiri-talk * HTML fragments now correctly handle comments and CDATA blocks. GH #78 * Nokogiri::XML::Document#clone is now an alias of dup * Deprecations * Nokogiri::XML::SAX::Document#start_element_ns is deprecated, please switch to Nokogiri::XML::SAX::Document#start_element_namespace * Nokogiri::XML::SAX::Document#end_element_ns is deprecated, please switch to Nokogiri::XML::SAX::Document#end_element_namespace === 1.3.1 / 2009-06-07 * Bugfixes * extconf.rb checks for optional RelaxNG and Schema functions * Namespace nodes are added to the Document node cache === 1.3.0 / 2009-05-30 * New Features * Builder changes scope based on block arity * Builder supports methods ending in underscore similar to tagz * Nokogiri::XML::Node#<=> compares nodes based on Document position * Nokogiri::XML::Node#matches? returns true if Node can be found with given selector. * Nokogiri::XML::Node#ancestors now returns an Nokogiri::XML::NodeSet * Nokogiri::XML::Node#ancestors will match parents against optional selector * Nokogiri::HTML::Document#meta_encoding for getting the meta encoding * Nokogiri::HTML::Document#meta_encoding= for setting the meta encoding * Nokogiri::XML::Document#encoding= to set the document encoding * Nokogiri::XML::Schema for validating documents against XSD schema * Nokogiri::XML::RelaxNG for validating documents against RelaxNG schema * Nokogiri::HTML::ElementDescription for fetching HTML element descriptions * Nokogiri::XML::Node#description to fetch the node description * Nokogiri::XML::Node#accept implements Visitor pattern * bin/nokogiri for easily examining documents (Thanks Yutaka HARA!) * Nokogiri::XML::NodeSet now supports more Array and Enumerable operators: index, delete, slice, - (difference), + (concatenation), & (intersection), push, pop, shift, == * Nokogiri.XML, Nokogiri.HTML take blocks that receive Nokogiri::XML::ParseOptions objects * Nokogiri::XML::Node#namespace returns a Nokogiri::XML::Namespace * Nokogiri::XML::Node#namespace= for setting a node's namespace * Nokogiri::XML::DocumentFragment and Nokogiri::HTML::DocumentFragment have a sensible API and a more robust implementation. * JRuby 1.3.0 support via FFI. * Bugfixes * Fixed a problem with nil passed to CDATA constructor * Fragment method deals with regular expression characters (Thanks Joel!) LH #73 * Fixing builder scope issues LH #61, LH #74, LH #70 * Fixed a problem when adding a child could remove the child namespace LH#78 * Fixed bug with unlinking a node then reparenting it. (GH#22) * Fixed failure to catch errors during XSLT parsing (GH#32) * Fixed a bug with attribute conditions in CSS selectors (GH#36) * Fixed intolerance of HTML attributes without values in Node#before/after/inner_html=. (GH#35) === 1.2.3 / 2009-03-22 * Bugfixes * Fixing bug where a node is passed in to Node#new * Namespace should be assigned on DocumentFragment creation. LH #66 * Nokogiri::XML::NodeSet#dup works GH #10 * Nokogiri::HTML returns an empty Document when given a blank string GH#11 * Adding a child will remove duplicate namespace declarations LH #67 * Builder methods take a hash as a second argument === 1.2.2 / 2009-03-14 * New features * Nokogiri may be used with soap4r. See XSD::XMLParser::Nokogiri * Nokogiri::XML::Node#inner_html= to set the inner html for a node * Nokogiri builder interface improvements * Nokogiri::XML::Node#swap swaps html for current node (LH #50) * Bugfixes * Fixed a tag nesting problem in the Builder API (LH #41) * Nokogiri::HTML.fragment will properly handle text only nodes (LH #43) * Nokogiri::XML::Node#before will prepend text nodes (LH #44) * Nokogiri::XML::Node#after will append text nodes * Nokogiri::XML::Node#search automatically registers root namespaces (LH #42) * Nokogiri::XML::NodeSet#search automatically registers namespaces * Nokogiri::HTML::NamedCharacters delegates to libxml2 * Nokogiri::XML::Node#[] can take a symbol (LH #48) * vasprintf for windows updated. Thanks Geoffroy Couprie! * Nokogiri::XML::Node#[]= should not encode entities (LH #55) * Namespaces should be copied to reparented nodes (LH #56) * Nokogiri uses encoding set on the string for default in Ruby 1.9 * Document#dup should create a new document of the same type (LH #59) * Document should not have a parent method (LH #64) === 1.2.1 / 2009-02-23 * Bugfixes * Fixed a CSS selector space bug * Fixed Ruby 1.9 String Encoding (Thanks 角谷さん!) === 1.2.0 / 2009-02-22 * New features * CSS search now supports CSS3 namespace queries * Namespaces on the root node are automatically registered * CSS queries use the default namespace * Nokogiri::XML::Document#encoding get encoding used for this document * Nokogiri::XML::Document#url get the document url * Nokogiri::XML::Node#add_namespace add a namespace to the node LH#38 * Nokogiri::XML::Node#each iterate over attribute name, value pairs * Nokogiri::XML::Node#keys get all attribute names * Nokogiri::XML::Node#line get the line number for a node (Thanks Dirkjan Bussink!) * Nokogiri::XML::Node#serialize now takes an optional encoding parameter * Nokogiri::XML::Node#to_html, to_xml, and to_xhtml take an optional encoding * Nokogiri::XML::Node#to_str * Nokogiri::XML::Node#to_xhtml to produce XHTML documents * Nokogiri::XML::Node#values get all attribute values * Nokogiri::XML::Node#write_to writes the node to an IO object with optional encoding * Nokogiri::XML::ProcessingInstrunction.new * Nokogiri::XML::SAX::PushParser for all your push parsing needs. * Bugfixes * Fixed Nokogiri::XML::Document#dup * Fixed header detection. Thanks rubikitch! * Fixed a problem where invalid CSS would cause the parser to hang * Deprecations * Nokogiri::XML::Node.new_from_str will be deprecated in 1.3.0 * API Changes * Nokogiri::HTML.fragment now returns an XML::DocumentFragment (LH #32) === 1.1.1 * New features * Added XML::Node#elem? * Added XML::Node#attribute_nodes * Added XML::Attr * XML::Node#delete added. * XML::NodeSet#inner_html added. * Bugfixes * Not including an HTML entity for \r for HTML nodes. * Removed CSS::SelectorHandler and XML::XPathHandler * XML::Node#attributes returns an Attr node for the value. * XML::NodeSet implements to_xml === 1.1.0 * New Features * Custom XPath functions are now supported. See Nokogiri::XML::Node#xpath * Custom CSS pseudo classes are now supported. See Nokogiri::XML::Node#css * Nokogiri::XML::Node#<< will add a child to the current node * Bugfixes * Mutex lock on CSS cache access * Fixed build problems with GCC 3.3.5 * XML::Node#to_xml now takes an indentation argument * XML::Node#dup takes an optional depth argument * XML::Node#add_previous_sibling returns new sibling node. === 1.0.7 * Bugfixes * Fixed memory leak when using Dike * SAX parser now parses IO streams * Comment nodes have their own class * Nokogiri() should delegate to Nokogiri.parse() * Prepending rather than appending to ENV['PATH'] on windows * Fixed a bug in complex CSS negation selectors === 1.0.6 * 5 Bugfixes * XPath Parser raises a SyntaxError on parse failure * CSS Parser raises a SyntaxError on parse failure * filter() and not() hpricot compatibility added * CSS searches via Node#search are now always relative * CSS to XPath conversion is now cached === 1.0.5 * Bugfixes * Added mailing list and ticket tracking information to the README.txt * Sets ENV['PATH'] on windows if it doesn't exist * Caching results of NodeSet#[] on Document === 1.0.4 * Bugfixes * Changed memory management from weak refs to document refs * Plugged some memory leaks * Builder blocks can call methods from surrounding contexts === 1.0.3 * 5 Bugfixes * NodeSet now implements to_ary * XML::Document should not implement parent * More GC Bugs fixed. (Mike is AWESOME!) * Removed RARRAY_LEN for 1.8.5 compatibility. Thanks Shane Hanna. * inner_html fixed. (Thanks Yehuda!) === 1.0.2 * 1 Bugfix * extconf.rb should not check for frex and racc === 1.0.1 * 1 Bugfix * Made sure extconf.rb searched libdir and prefix so that ports libxml/ruby will link properly. Thanks lucsky! === 1.0.0 / 2008-07-13 * 1 major enhancement * Birthday! nokogiri-1.6.1/.travis.yml0000644000175000017500000000057212261213762015035 0ustar boutilboutillanguage: ruby rvm: - 1.9.2 - 1.9.3 - ruby-head - ree - jruby-19mode - rbx-19mode jdk: - openjdk7 - openjdk6 matrix: allow_failures: - rvm: rbx-19mode exclude: - rvm: 1.9.2 jdk: openjdk7 - rvm: 1.9.3 jdk: openjdk7 - rvm: ruby-head jdk: openjdk7 - rvm: ree jdk: openjdk7 - rvm: rbx-19mode jdk: openjdk7 nokogiri-1.6.1/README.rdoc0000644000175000017500000001231412261213762014527 0ustar boutilboutil= Nokogiri {}[http://travis-ci.org/sparklemotion/nokogiri] {}[https://codeclimate.com/github/sparklemotion/nokogiri] * http://nokogiri.org * http://github.com/sparklemotion/nokogiri/wikis * http://github.com/sparklemotion/nokogiri/tree/master * http://groups.google.com/group/nokogiri-talk * http://github.com/sparklemotion/nokogiri/issues == DESCRIPTION: Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors. XML is like violence - if it doesn’t solve your problems, you are not using enough of it. == FEATURES: * XPath support for document searching * CSS3 selector support for document searching * XML/HTML builder Nokogiri parses and searches XML/HTML very quickly, and also has correctly implemented CSS3 selector support as well as XPath support. == SUPPORT: Before filing a bug report, please read our {submission guidelines}[http://nokogiri.org/tutorials/getting_help.html] at: * http://nokogiri.org/tutorials/getting_help.html The Nokogiri {mailing list}[http://groups.google.com/group/nokogiri-talk] is available here: * http://groups.google.com/group/nokogiri-talk The {bug tracker}[http://github.com/sparklemotion/nokogiri/issues] is available here: * http://github.com/sparklemotion/nokogiri/issues The IRC channel is #nokogiri on freenode. == SYNOPSIS: require 'nokogiri' require 'open-uri' # Get a Nokogiri::HTML::Document for the page we’re interested in... doc = Nokogiri::HTML(open('http://www.google.com/search?q=sparklemotion')) # Do funky things with it using Nokogiri::XML::Node methods... #### # Search for nodes by css doc.css('h3.r a').each do |link| puts link.content end #### # Search for nodes by xpath doc.xpath('//h3/a').each do |link| puts link.content end #### # Or mix and match. doc.search('h3.r a.l', '//h3/a').each do |link| puts link.content end == REQUIREMENTS: * ruby 1.8 or 1.9 * libxml2 * libxml2-dev * libxslt * libxslt-dev == ENCODING: Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return XML (like to_xml, to_html and inner_html) will return a string encoded like the source document. *WARNING* Some documents declare one particular encoding, but use a different one. So, which encoding should the parser choose? Remember that data is just a stream of bytes. Only we humans add meaning to that stream. Any particular set of bytes could be valid characters in multiple encodings, so detecting encoding with 100% accuracy is not possible. libxml2 does its best, but it can't be right 100% of the time. If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser: doc = Nokogiri.XML('', nil, 'EUC-JP') == INSTALL: * sudo gem install nokogiri === Binary packages Binary packages are available for: * SuSE[http://download.opensuse.org/repositories/devel:/languages:/ruby:/extensions/] * Fedora[http://s390.koji.fedoraproject.org/koji/packageinfo?packageID=6756] == DEVELOPMENT: === Developing on C Ruby (MRI) Developing Nokogiri requires racc and rexical to generate the parser and tokenizer. To start development, make sure you have `libxml2` and `libxslt` installed. Then install core gems and bootstrap: $ gem install hoe rake-compiler mini_portile $ rake newb === Developing on JRuby Currently, development with JRuby depends on CRuby being installed. With CRuby, install racc and rexical: $ gem install racc rexical Make sure hoe and rake compiler are installed with JRuby: $ jgem install hoe rake-compiler Then run rake: $ jruby -S rake == LICENSE: (The MIT License) Copyright (c) 2008 - 2012: * {Aaron Patterson}[http://tenderlovemaking.com] * {Mike Dalessio}[http://mike.daless.io] * {Charles Nutter}[http://blog.headius.com] * {Sergio Arbeo}[http://www.serabe.com] * {Patrick Mahoney}[http://polycrystal.org] * {Yoko Harada}[http://yokolet.blogspot.com] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. nokogiri-1.6.1/C_CODING_STYLE.rdoc0000644000175000017500000000115212261213762015755 0ustar boutilboutil= C/C++ mode style for Nokogiri Please don't propose commits that only change whitespace. However, if your commit touches a function or section that is not using MRI Ruby conventions, feel free to update whitespace in the surrounding code. = WHITESPACE: * indent level: 2 * indent type: Always spaces * line Breaks: LF This style can be automatically applied by running: astyle --indent=spaces=2 --style=1tbs --keep-one-line-blocks $(ack -f --type=cpp --type=cc ext/nokogiri) = FUNCTION DECLARATION: ANSI C style: type name(args) { declarations code } = SOURCES: * <3<3<3 nokogiri-1.6.1/ROADMAP.md0000644000175000017500000000600112261213762014322 0ustar boutilboutil# Roadmap for 2.0 ## overhaul serialize/pretty printing API * https://github.com/sparklemotion/nokogiri/issues/530 XHTML formatting can't be turned off * https://github.com/sparklemotion/nokogiri/issues/415 XML formatting should be no formatting ## overhaul and optimize the SAX parsing * see fairy wing throwdown - SAX parsing is wicked slow. ## Node should not be Enumerable; and should have a better attributes API * https://github.com/sparklemotion/nokogiri/issues/679 Mixing in Enumerable has some unintended consequences; plus we want to improve the attributes API * Some ideas for a better attributes API? * (closed) https://github.com/sparklemotion/nokogiri/issues/666 * https://github.com/sparklemotion/nokogiri/issues/765 ## improve CSS query parsing * https://github.com/sparklemotion/nokogiri/issues/528 support `:not()` with a nontrivial argument, like `:not(div p.c)` * https://github.com/sparklemotion/nokogiri/issues/451 chained :not pseudoselectors * better jQuery selector and CSS pseudo-selector support: * https://github.com/sparklemotion/nokogiri/issues/621 * https://github.com/sparklemotion/nokogiri/issues/342 * https://github.com/sparklemotion/nokogiri/issues/628 * https://github.com/sparklemotion/nokogiri/issues/652 * https://github.com/sparklemotion/nokogiri/issues/688 * https://github.com/sparklemotion/nokogiri/issues/394 nth-of-type is wrong, and possibly other selectors as well * https://github.com/sparklemotion/nokogiri/issues/309 incorrect query being executed * https://github.com/sparklemotion/nokogiri/issues/350 :has is wrong? ## DocumentFragment * there are a few tickets about searches not working properly if you use or do not use the context node as part of the search. - https://github.com/sparklemotion/nokogiri/issues/213 - https://github.com/sparklemotion/nokogiri/issues/370 - https://github.com/sparklemotion/nokogiri/issues/454 - https://github.com/sparklemotion/nokogiri/issues/572 ## Better Syntax for custom XPath function handler * https://github.com/sparklemotion/nokogiri/pull/464 ## Better Syntax around Node#xpath and NodeSet#xpath * look at those methods, and use of Node#extract_params in Node#{css,search} * we should standardize on a hash of options for these and other calls * what should NodeSet#xpath return? * https://github.com/sparklemotion/nokogiri/issues/656 * also, clean up or unify the implementations of #xpath-and-friends in Node and NodeSet * implementations are very similar, but no shared code :( * decorate nodes in a consistent manner ## Encoding We have a lot of issues open around encoding. How bad are things? Would it help if we deprecated support for Ruby 1.8.7? Somebody who knows encoding well should head this up. * Extract EncodingReader as a real object that can be injected https://groups.google.com/forum/#!msg/nokogiri-talk/arJeAtMqvkg/tGihB-iBRSAJ ## Reader It's fundamentally broken, in that we can't stop people from crashing their application if they want to use object reference unsafely. nokogiri-1.6.1/Manifest.txt0000644000175000017500000002345112261213762015234 0ustar boutilboutil.autotest .gemtest .travis.yml CHANGELOG.ja.rdoc CHANGELOG.rdoc C_CODING_STYLE.rdoc Gemfile Manifest.txt README.ja.rdoc README.rdoc ROADMAP.md Rakefile STANDARD_RESPONSES.md Y_U_NO_GEMSPEC.md bin/nokogiri build_all dependencies.yml ext/java/nokogiri/EncodingHandler.java ext/java/nokogiri/HtmlDocument.java ext/java/nokogiri/HtmlElementDescription.java ext/java/nokogiri/HtmlEntityLookup.java ext/java/nokogiri/HtmlSaxParserContext.java ext/java/nokogiri/NokogiriService.java ext/java/nokogiri/XmlAttr.java ext/java/nokogiri/XmlAttributeDecl.java ext/java/nokogiri/XmlCdata.java ext/java/nokogiri/XmlComment.java ext/java/nokogiri/XmlDocument.java ext/java/nokogiri/XmlDocumentFragment.java ext/java/nokogiri/XmlDtd.java ext/java/nokogiri/XmlElement.java ext/java/nokogiri/XmlElementContent.java ext/java/nokogiri/XmlElementDecl.java ext/java/nokogiri/XmlEntityDecl.java ext/java/nokogiri/XmlEntityReference.java ext/java/nokogiri/XmlNamespace.java ext/java/nokogiri/XmlNode.java ext/java/nokogiri/XmlNodeSet.java ext/java/nokogiri/XmlProcessingInstruction.java ext/java/nokogiri/XmlReader.java ext/java/nokogiri/XmlRelaxng.java ext/java/nokogiri/XmlSaxParserContext.java ext/java/nokogiri/XmlSaxPushParser.java ext/java/nokogiri/XmlSchema.java ext/java/nokogiri/XmlSyntaxError.java ext/java/nokogiri/XmlText.java ext/java/nokogiri/XmlXpathContext.java ext/java/nokogiri/XsltStylesheet.java ext/java/nokogiri/internals/ClosedStreamException.java ext/java/nokogiri/internals/HtmlDomParserContext.java ext/java/nokogiri/internals/NokogiriBlockingQueueInputStream.java ext/java/nokogiri/internals/NokogiriDocumentCache.java ext/java/nokogiri/internals/NokogiriDomParser.java ext/java/nokogiri/internals/NokogiriEncodingReaderWrapper.java ext/java/nokogiri/internals/NokogiriEntityResolver.java ext/java/nokogiri/internals/NokogiriErrorHandler.java ext/java/nokogiri/internals/NokogiriHandler.java ext/java/nokogiri/internals/NokogiriHelpers.java ext/java/nokogiri/internals/NokogiriNamespaceCache.java ext/java/nokogiri/internals/NokogiriNamespaceContext.java ext/java/nokogiri/internals/NokogiriNonStrictErrorHandler.java ext/java/nokogiri/internals/NokogiriNonStrictErrorHandler4NekoHtml.java ext/java/nokogiri/internals/NokogiriStrictErrorHandler.java ext/java/nokogiri/internals/NokogiriXPathFunction.java ext/java/nokogiri/internals/NokogiriXPathFunctionResolver.java ext/java/nokogiri/internals/NokogiriXPathVariableResolver.java ext/java/nokogiri/internals/NokogiriXsltErrorListener.java ext/java/nokogiri/internals/ParserContext.java ext/java/nokogiri/internals/ReaderNode.java ext/java/nokogiri/internals/SaveContextVisitor.java ext/java/nokogiri/internals/SchemaErrorHandler.java ext/java/nokogiri/internals/UncloseableInputStream.java ext/java/nokogiri/internals/XmlDeclHandler.java ext/java/nokogiri/internals/XmlDomParserContext.java ext/java/nokogiri/internals/XmlSaxParser.java ext/java/nokogiri/internals/XsltExtensionFunction.java ext/nokogiri/depend ext/nokogiri/extconf.rb ext/nokogiri/html_document.c ext/nokogiri/html_document.h ext/nokogiri/html_element_description.c ext/nokogiri/html_element_description.h ext/nokogiri/html_entity_lookup.c ext/nokogiri/html_entity_lookup.h ext/nokogiri/html_sax_parser_context.c ext/nokogiri/html_sax_parser_context.h ext/nokogiri/html_sax_push_parser.c ext/nokogiri/html_sax_push_parser.h ext/nokogiri/nokogiri.c ext/nokogiri/nokogiri.h ext/nokogiri/xml_attr.c ext/nokogiri/xml_attr.h ext/nokogiri/xml_attribute_decl.c ext/nokogiri/xml_attribute_decl.h ext/nokogiri/xml_cdata.c ext/nokogiri/xml_cdata.h ext/nokogiri/xml_comment.c ext/nokogiri/xml_comment.h ext/nokogiri/xml_document.c ext/nokogiri/xml_document.h ext/nokogiri/xml_document_fragment.c ext/nokogiri/xml_document_fragment.h ext/nokogiri/xml_dtd.c ext/nokogiri/xml_dtd.h ext/nokogiri/xml_element_content.c ext/nokogiri/xml_element_content.h ext/nokogiri/xml_element_decl.c ext/nokogiri/xml_element_decl.h ext/nokogiri/xml_encoding_handler.c ext/nokogiri/xml_encoding_handler.h ext/nokogiri/xml_entity_decl.c ext/nokogiri/xml_entity_decl.h ext/nokogiri/xml_entity_reference.c ext/nokogiri/xml_entity_reference.h ext/nokogiri/xml_io.c ext/nokogiri/xml_io.h ext/nokogiri/xml_libxml2_hacks.c ext/nokogiri/xml_libxml2_hacks.h ext/nokogiri/xml_namespace.c ext/nokogiri/xml_namespace.h ext/nokogiri/xml_node.c ext/nokogiri/xml_node.h ext/nokogiri/xml_node_set.c ext/nokogiri/xml_node_set.h ext/nokogiri/xml_processing_instruction.c ext/nokogiri/xml_processing_instruction.h ext/nokogiri/xml_reader.c ext/nokogiri/xml_reader.h ext/nokogiri/xml_relax_ng.c ext/nokogiri/xml_relax_ng.h ext/nokogiri/xml_sax_parser.c ext/nokogiri/xml_sax_parser.h ext/nokogiri/xml_sax_parser_context.c ext/nokogiri/xml_sax_parser_context.h ext/nokogiri/xml_sax_push_parser.c ext/nokogiri/xml_sax_push_parser.h ext/nokogiri/xml_schema.c ext/nokogiri/xml_schema.h ext/nokogiri/xml_syntax_error.c ext/nokogiri/xml_syntax_error.h ext/nokogiri/xml_text.c ext/nokogiri/xml_text.h ext/nokogiri/xml_xpath_context.c ext/nokogiri/xml_xpath_context.h ext/nokogiri/xslt_stylesheet.c ext/nokogiri/xslt_stylesheet.h lib/isorelax.jar lib/jing.jar lib/nekodtd.jar lib/nekohtml.jar lib/nokogiri.rb lib/nokogiri/css.rb lib/nokogiri/css/node.rb lib/nokogiri/css/parser.rb lib/nokogiri/css/parser.y lib/nokogiri/css/parser_extras.rb lib/nokogiri/css/syntax_error.rb lib/nokogiri/css/tokenizer.rb lib/nokogiri/css/tokenizer.rex lib/nokogiri/css/xpath_visitor.rb lib/nokogiri/decorators/slop.rb lib/nokogiri/html.rb lib/nokogiri/html/builder.rb lib/nokogiri/html/document.rb lib/nokogiri/html/document_fragment.rb lib/nokogiri/html/element_description.rb lib/nokogiri/html/element_description_defaults.rb lib/nokogiri/html/entity_lookup.rb lib/nokogiri/html/sax/parser.rb lib/nokogiri/html/sax/parser_context.rb lib/nokogiri/html/sax/push_parser.rb lib/nokogiri/syntax_error.rb lib/nokogiri/version.rb lib/nokogiri/xml.rb lib/nokogiri/xml/attr.rb lib/nokogiri/xml/attribute_decl.rb lib/nokogiri/xml/builder.rb lib/nokogiri/xml/cdata.rb lib/nokogiri/xml/character_data.rb lib/nokogiri/xml/document.rb lib/nokogiri/xml/document_fragment.rb lib/nokogiri/xml/dtd.rb lib/nokogiri/xml/element_content.rb lib/nokogiri/xml/element_decl.rb lib/nokogiri/xml/entity_decl.rb lib/nokogiri/xml/namespace.rb lib/nokogiri/xml/node.rb lib/nokogiri/xml/node/save_options.rb lib/nokogiri/xml/node_set.rb lib/nokogiri/xml/notation.rb lib/nokogiri/xml/parse_options.rb lib/nokogiri/xml/pp.rb lib/nokogiri/xml/pp/character_data.rb lib/nokogiri/xml/pp/node.rb lib/nokogiri/xml/processing_instruction.rb lib/nokogiri/xml/reader.rb lib/nokogiri/xml/relax_ng.rb lib/nokogiri/xml/sax.rb lib/nokogiri/xml/sax/document.rb lib/nokogiri/xml/sax/parser.rb lib/nokogiri/xml/sax/parser_context.rb lib/nokogiri/xml/sax/push_parser.rb lib/nokogiri/xml/schema.rb lib/nokogiri/xml/syntax_error.rb lib/nokogiri/xml/text.rb lib/nokogiri/xml/xpath.rb lib/nokogiri/xml/xpath/syntax_error.rb lib/nokogiri/xml/xpath_context.rb lib/nokogiri/xslt.rb lib/nokogiri/xslt/stylesheet.rb lib/xercesImpl.jar lib/xsd/xmlparser/nokogiri.rb tasks/cross_compile.rb tasks/nokogiri.org.rb tasks/test.rb test/css/test_nthiness.rb test/css/test_parser.rb test/css/test_tokenizer.rb test/css/test_xpath_visitor.rb test/decorators/test_slop.rb test/files/2ch.html test/files/address_book.rlx test/files/address_book.xml test/files/bar/bar.xsd test/files/bogus.xml test/files/dont_hurt_em_why.xml test/files/encoding.html test/files/encoding.xhtml test/files/exslt.xml test/files/exslt.xslt test/files/foo/foo.xsd test/files/metacharset.html test/files/noencoding.html test/files/po.xml test/files/po.xsd test/files/saml/saml20assertion_schema.xsd test/files/saml/saml20protocol_schema.xsd test/files/saml/xenc_schema.xsd test/files/saml/xmldsig_schema.xsd test/files/shift_jis.html test/files/shift_jis.xml test/files/snuggles.xml test/files/staff.dtd test/files/staff.xml test/files/staff.xslt test/files/test_document_url/bar.xml test/files/test_document_url/document.dtd test/files/test_document_url/document.xml test/files/tlm.html test/files/to_be_xincluded.xml test/files/valid_bar.xml test/files/xinclude.xml test/helper.rb test/html/sax/test_parser.rb test/html/sax/test_parser_context.rb test/html/test_builder.rb test/html/test_document.rb test/html/test_document_encoding.rb test/html/test_document_fragment.rb test/html/test_element_description.rb test/html/test_named_characters.rb test/html/test_node.rb test/html/test_node_encoding.rb test/namespaces/test_additional_namespaces_in_builder_doc.rb test/namespaces/test_namespaces_in_builder_doc.rb test/namespaces/test_namespaces_in_created_doc.rb test/namespaces/test_namespaces_in_parsed_doc.rb test/test_convert_xpath.rb test/test_css_cache.rb test/test_encoding_handler.rb test/test_memory_leak.rb test/test_nokogiri.rb test/test_reader.rb test/test_soap4r_sax.rb test/test_xslt_transforms.rb test/xml/node/test_save_options.rb test/xml/node/test_subclass.rb test/xml/sax/test_parser.rb test/xml/sax/test_parser_context.rb test/xml/sax/test_push_parser.rb test/xml/test_attr.rb test/xml/test_attribute_decl.rb test/xml/test_builder.rb test/xml/test_c14n.rb test/xml/test_cdata.rb test/xml/test_comment.rb test/xml/test_document.rb test/xml/test_document_encoding.rb test/xml/test_document_fragment.rb test/xml/test_dtd.rb test/xml/test_dtd_encoding.rb test/xml/test_element_content.rb test/xml/test_element_decl.rb test/xml/test_entity_decl.rb test/xml/test_entity_reference.rb test/xml/test_namespace.rb test/xml/test_node.rb test/xml/test_node_attributes.rb test/xml/test_node_encoding.rb test/xml/test_node_inheritance.rb test/xml/test_node_reparenting.rb test/xml/test_node_set.rb test/xml/test_parse_options.rb test/xml/test_processing_instruction.rb test/xml/test_reader_encoding.rb test/xml/test_relax_ng.rb test/xml/test_schema.rb test/xml/test_syntax_error.rb test/xml/test_text.rb test/xml/test_unparented_node.rb test/xml/test_xinclude.rb test/xml/test_xpath.rb test/xslt/test_custom_functions.rb test/xslt/test_exception_handling.rb test_all nokogiri-1.6.1/tasks/0000755000175000017500000000000012261213762014045 5ustar boutilboutilnokogiri-1.6.1/tasks/cross_compile.rb0000644000175000017500000000745312261213762017244 0ustar boutilboutilgem 'rake-compiler' require 'rake/extensioncompiler' HOST = Rake::ExtensionCompiler.mingw_host require 'resolv' require 'mini_portile' dependencies = YAML.load_file("dependencies.yml") $recipes = {} %w[zlib libiconv libxml2 libxslt].each do |lib| $recipes[lib] = MiniPortile.new lib, dependencies[lib] end $recipes.each { |_, recipe| recipe.host = HOST } file "lib/nokogiri/nokogiri.rb" do File.open("lib/nokogiri/nokogiri.rb", 'wb') do |f| f.write %Q{require "nokogiri/\#{RUBY_VERSION.sub(/\\.\\d+$/, '')}/nokogiri"\n} end end namespace :cross do task :zlib do recipe = $recipes["zlib"] recipe.files = ["http://zlib.net/#{recipe.name}-#{recipe.version}.tar.gz"] class << recipe def configure Dir.chdir work_path do mk = File.read 'win32/Makefile.gcc' File.open 'win32/Makefile.gcc', 'wb' do |f| f.puts "BINARY_PATH = #{CROSS_DIR}/bin" f.puts "LIBRARY_PATH = #{CROSS_DIR}/lib" f.puts "INCLUDE_PATH = #{CROSS_DIR}/include" f.puts mk.sub(/^PREFIX\s*=\s*$/, "PREFIX = #{HOST}-") end end end def configured? Dir.chdir work_path do !! (File.read('win32/Makefile.gcc') =~ /^BINARY_PATH/) end end def compile execute "compile", "make -f win32/Makefile.gcc" end def install execute "install", "make -f win32/Makefile.gcc install" end end checkpoint = "#{CROSS_DIR}/#{recipe.name}-#{recipe.version}-#{recipe.host}.installed" unless File.exist?(checkpoint) recipe.cook touch checkpoint end recipe.activate end task :libiconv do recipe = $recipes["libiconv"] recipe.files = ["http://ftp.gnu.org/pub/gnu/libiconv/#{recipe.name}-#{recipe.version}.tar.gz"] recipe.configure_options = [ "--host=#{HOST}", "--enable-static", "--disable-shared", "CPPFLAGS='-mno-cygwin -Wall'", "CFLAGS='-mno-cygwin -O2 -g'", "CXXFLAGS='-mno-cygwin -O2 -g'", "LDFLAGS=-mno-cygwin" ] checkpoint = "#{CROSS_DIR}/#{recipe.name}-#{recipe.version}-#{recipe.host}.installed" unless File.exist?(checkpoint) recipe.cook touch checkpoint end recipe.activate end task :libxml2 => ["cross:zlib", "cross:libiconv"] do recipe = $recipes["libxml2"] recipe.files = ["ftp://ftp.xmlsoft.org/libxml2/#{recipe.name}-#{recipe.version}.tar.gz"] recipe.configure_options = [ "--host=#{HOST}", "--enable-static", "--disable-shared", "--with-zlib=#{CROSS_DIR}", "--with-iconv=#{$recipes["libiconv"].path}", "--without-python", "--without-readline", "CFLAGS='-DIN_LIBXML'" ] checkpoint = "#{CROSS_DIR}/#{recipe.name}-#{recipe.version}-#{recipe.host}.installed" unless File.exist?(checkpoint) recipe.cook touch checkpoint end recipe.activate end task :libxslt => ['cross:libxml2'] do recipe = $recipes["libxslt"] recipe.files = ["ftp://ftp.xmlsoft.org/libxml2/#{recipe.name}-#{recipe.version}.tar.gz"] recipe.configure_options = [ "--host=#{HOST}", "--enable-static", "--disable-shared", "--with-libxml-prefix=#{$recipes["libxml2"].path}", "--without-python", "--without-crypto", "CFLAGS='-DIN_LIBXML'" ] checkpoint = "#{CROSS_DIR}/#{recipe.name}-#{recipe.version}-#{recipe.host}.installed" unless File.exist?(checkpoint) recipe.cook touch checkpoint end recipe.activate end task :file_list do HOE.spec.files += Dir["lib/nokogiri/nokogiri.rb"] HOE.spec.files += Dir["lib/nokogiri/{1.9,2.0}/nokogiri.so"] end end require 'rake/clean' CLOBBER.include("#{CROSS_DIR}/*.installed", "#{CROSS_DIR}/#{HOST}", "tmp/#{HOST}") task :cross => ["cross:libxslt", "lib/nokogiri/nokogiri.rb", "cross:file_list"] nokogiri-1.6.1/tasks/test.rb0000644000175000017500000000612612261213762015356 0ustar boutilboutilnamespace :test do desc "run test suite with aggressive GC" task :gc => :build do ENV['NOKOGIRI_GC'] = "true" Rake::Task["test"].invoke end desc "find call-seq in the rdoc" task :rdoc_call_seq => 'docs' do Dir['doc/**/*.html'].each { |docfile| next if docfile =~ /\.src/ puts "FAIL: #{docfile}" if File.read(docfile) =~ /call-seq/ } end desc "find all undocumented things" task :rdoc => 'docs' do base = File.expand_path(File.join(File.dirname(__FILE__), '..', 'doc')) require 'test/unit' test = Class.new(Test::Unit::TestCase) Dir["#{base}/**/*.html"].each { |docfile| test.class_eval(<<-eotest) def test_#{docfile.sub("#{base}/", '').gsub(/[\/\.-]/, '_')} assert_no_match( /Not documented/, File.read('#{docfile}'), '#{docfile} has undocumented things' ) end eotest } end desc "Test against multiple versions of libxml2 (MULTIXML2_DIR=directory)" task :multixml2 do MULTI_XML = File.join(ENV['HOME'], '.multixml2') unless File.exists?(MULTI_XML) %w{ versions install build }.each { |x| FileUtils.mkdir_p(File.join(MULTI_XML, x)) } Dir.chdir File.join(MULTI_XML, 'versions') do require 'net/ftp' puts "Contacting xmlsoft.org ..." ftp = Net::FTP.new('xmlsoft.org') ftp.login('anonymous', 'anonymous') ftp.chdir('libxml2') ftp.list('libxml2-2.*.tar.gz').each do |x| file = x[/[^\s]*$/] puts "Downloading #{file}" ftp.getbinaryfile(file) end end end # Build any libxml2 versions in $HOME/.multixml2/versions that # haven't been built yet Dir[File.join(MULTI_XML, 'versions','*.tar.gz')].each do |f| filename = File.basename(f, '.tar.gz') install_dir = File.join(MULTI_XML, 'install', filename) next if File.exists?(install_dir) Dir.chdir File.join(MULTI_XML, 'versions') do system "tar zxvf #{f} -C #{File.join(MULTI_XML, 'build')}" end Dir.chdir File.join(MULTI_XML, 'build', filename) do system "./configure --without-http --prefix=#{install_dir}" system "make && make install" end end test_results = {} libxslt = Dir[File.join(MULTI_XML, 'install', 'libxslt*')].first directories = ENV['MULTIXML2_DIR'] ? [ENV['MULTIXML2_DIR']] : Dir[File.join(MULTI_XML, 'install', '*')] directories.sort.reverse_each do |xml2_version| next unless xml2_version =~ /libxml2/ extopts = "--with-xml2-include=#{xml2_version}/include/libxml2 --with-xml2-lib=#{xml2_version}/lib --with-xslt-dir=#{libxslt} --with-iconv-dir=/usr" cmd = "#{$0} clean test EXTOPTS='#{extopts}' LD_LIBRARY_PATH='#{xml2_version}/lib'" version = File.basename(xml2_version) result = system(cmd) test_results[version] = { :result => result, :cmd => cmd } end test_results.sort_by { |k,v| k }.each do |k,v| passed = v[:result] puts "#{k}: #{passed ? 'PASS' : 'FAIL'}" puts "repro: #{v[:cmd]}" unless passed end end end nokogiri-1.6.1/tasks/nokogiri.org.rb0000644000175000017500000000141512261213762017002 0ustar boutilboutil# # note that this file will only work if you've got the `nokogiri.org` # repo checked out, and you've got an rvm gemset "1.8.7@nokogiri" # bundled with both nokogiri's and nokogiri.org's gems. # namespace :docs do desc "generate HTML docs for nokogiri.org" task :website do system 'rvm use 1.8.7@nokogiri' # see above title = "#{HOE.name}-#{HOE.version} Documentation" options = [] options << "--main=#{HOE.readme_file}" options << '--format=activerecord' options << '--threads=1' options << "--title=#{title.inspect}" options += HOE.spec.require_paths options += HOE.spec.extra_rdoc_files require 'rdoc/rdoc' ENV['RAILS_ROOT'] ||= File.expand_path(File.join('..', 'nokogiri_ws')) RDoc::RDoc.new.document options end end nokogiri-1.6.1/dependencies.yml0000644000175000017500000000010412261213762016064 0ustar boutilboutillibxml2: "2.8.0" libxslt: "1.1.26" zlib: "1.2.7" libiconv: "1.13.1" nokogiri-1.6.1/ports/0000755000175000017500000000000012261213762014067 5ustar boutilboutilnokogiri-1.6.1/ext/0000755000175000017500000000000012261213762013520 5ustar boutilboutilnokogiri-1.6.1/ext/nokogiri/0000755000175000017500000000000012261213762015341 5ustar boutilboutilnokogiri-1.6.1/ext/nokogiri/html_element_description.c0000644000175000017500000001341412261213762022570 0ustar boutilboutil#include /* * call-seq: * required_attributes * * A list of required attributes for this element */ static VALUE required_attributes(VALUE self) { htmlElemDesc * description; VALUE list; int i; Data_Get_Struct(self, htmlElemDesc, description); list = rb_ary_new(); if(NULL == description->attrs_req) return list; for(i = 0; description->attrs_depr[i]; i++) { rb_ary_push(list, NOKOGIRI_STR_NEW2(description->attrs_req[i])); } return list; } /* * call-seq: * deprecated_attributes * * A list of deprecated attributes for this element */ static VALUE deprecated_attributes(VALUE self) { htmlElemDesc * description; VALUE list; int i; Data_Get_Struct(self, htmlElemDesc, description); list = rb_ary_new(); if(NULL == description->attrs_depr) return list; for(i = 0; description->attrs_depr[i]; i++) { rb_ary_push(list, NOKOGIRI_STR_NEW2(description->attrs_depr[i])); } return list; } /* * call-seq: * optional_attributes * * A list of optional attributes for this element */ static VALUE optional_attributes(VALUE self) { htmlElemDesc * description; VALUE list; int i; Data_Get_Struct(self, htmlElemDesc, description); list = rb_ary_new(); if(NULL == description->attrs_opt) return list; for(i = 0; description->attrs_opt[i]; i++) { rb_ary_push(list, NOKOGIRI_STR_NEW2(description->attrs_opt[i])); } return list; } /* * call-seq: * default_sub_element * * The default sub element for this element */ static VALUE default_sub_element(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if (description->defaultsubelt) return NOKOGIRI_STR_NEW2(description->defaultsubelt); return Qnil; } /* * call-seq: * sub_elements * * A list of allowed sub elements for this element. */ static VALUE sub_elements(VALUE self) { htmlElemDesc * description; VALUE list; int i; Data_Get_Struct(self, htmlElemDesc, description); list = rb_ary_new(); if(NULL == description->subelts) return list; for(i = 0; description->subelts[i]; i++) { rb_ary_push(list, NOKOGIRI_STR_NEW2(description->subelts[i])); } return list; } /* * call-seq: * description * * The description for this element */ static VALUE description(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); return NOKOGIRI_STR_NEW2(description->desc); } /* * call-seq: * inline? * * Is this element an inline element? */ static VALUE inline_eh(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if(description->isinline) return Qtrue; return Qfalse; } /* * call-seq: * deprecated? * * Is this element deprecated? */ static VALUE deprecated_eh(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if(description->depr) return Qtrue; return Qfalse; } /* * call-seq: * empty? * * Is this an empty element? */ static VALUE empty_eh(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if(description->empty) return Qtrue; return Qfalse; } /* * call-seq: * save_end_tag? * * Should the end tag be saved? */ static VALUE save_end_tag_eh(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if(description->saveEndTag) return Qtrue; return Qfalse; } /* * call-seq: * implied_end_tag? * * Can the end tag be implied for this tag? */ static VALUE implied_end_tag_eh(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if(description->endTag) return Qtrue; return Qfalse; } /* * call-seq: * implied_start_tag? * * Can the start tag be implied for this tag? */ static VALUE implied_start_tag_eh(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if(description->startTag) return Qtrue; return Qfalse; } /* * call-seq: * name * * Get the tag name for this ElemementDescription */ static VALUE name(VALUE self) { htmlElemDesc * description; Data_Get_Struct(self, htmlElemDesc, description); if(NULL == description->name) return Qnil; return NOKOGIRI_STR_NEW2(description->name); } /* * call-seq: * [](tag_name) * * Get ElemementDescription for +tag_name+ */ static VALUE get_description(VALUE klass, VALUE tag_name) { const htmlElemDesc * description = htmlTagLookup( (const xmlChar *)StringValuePtr(tag_name) ); if(NULL == description) return Qnil; return Data_Wrap_Struct(klass, 0, 0, (void *)description); } VALUE cNokogiriHtmlElementDescription ; void init_html_element_description() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE html = rb_define_module_under(nokogiri, "HTML"); VALUE klass = rb_define_class_under(html, "ElementDescription",rb_cObject); cNokogiriHtmlElementDescription = klass; rb_define_singleton_method(klass, "[]", get_description, 1); rb_define_method(klass, "name", name, 0); rb_define_method(klass, "implied_start_tag?", implied_start_tag_eh, 0); rb_define_method(klass, "implied_end_tag?", implied_end_tag_eh, 0); rb_define_method(klass, "save_end_tag?", save_end_tag_eh, 0); rb_define_method(klass, "empty?", empty_eh, 0); rb_define_method(klass, "deprecated?", deprecated_eh, 0); rb_define_method(klass, "inline?", inline_eh, 0); rb_define_method(klass, "description", description, 0); rb_define_method(klass, "sub_elements", sub_elements, 0); rb_define_method(klass, "default_sub_element", default_sub_element, 0); rb_define_method(klass, "optional_attributes", optional_attributes, 0); rb_define_method(klass, "deprecated_attributes", deprecated_attributes, 0); rb_define_method(klass, "required_attributes", required_attributes, 0); } nokogiri-1.6.1/ext/nokogiri/html_entity_lookup.c0000644000175000017500000000145412261213762021442 0ustar boutilboutil#include /* * call-seq: * get(key) * * Get the HTML::EntityDescription for +key+ */ static VALUE get(VALUE self, VALUE key) { const htmlEntityDesc * desc = htmlEntityLookup((const xmlChar *)StringValuePtr(key)); VALUE klass, args[3]; if(NULL == desc) return Qnil; klass = rb_const_get(mNokogiriHtml, rb_intern("EntityDescription")); args[0] = INT2NUM((long)desc->value); args[1] = NOKOGIRI_STR_NEW2(desc->name); args[2] = NOKOGIRI_STR_NEW2(desc->desc); return rb_class_new_instance(3, args, klass); } void init_html_entity_lookup() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE html = rb_define_module_under(nokogiri, "HTML"); VALUE klass = rb_define_class_under(html, "EntityLookup", rb_cObject); rb_define_method(klass, "get", get, 1); } nokogiri-1.6.1/ext/nokogiri/xml_document.h0000644000175000017500000000130112261213762020203 0ustar boutilboutil#ifndef NOKOGIRI_XML_DOCUMENT #define NOKOGIRI_XML_DOCUMENT #include struct _nokogiriTuple { VALUE doc; st_table *unlinkedNodes; VALUE node_cache; }; typedef struct _nokogiriTuple nokogiriTuple; typedef nokogiriTuple * nokogiriTuplePtr; void init_xml_document(); VALUE Nokogiri_wrap_xml_document(VALUE klass, xmlDocPtr doc); #define DOC_RUBY_OBJECT_TEST(x) ((nokogiriTuplePtr)(x->_private)) #define DOC_RUBY_OBJECT(x) (((nokogiriTuplePtr)(x->_private))->doc) #define DOC_UNLINKED_NODE_HASH(x) (((nokogiriTuplePtr)(x->_private))->unlinkedNodes) #define DOC_NODE_CACHE(x) (((nokogiriTuplePtr)(x->_private))->node_cache) extern VALUE cNokogiriXmlDocument ; #endif nokogiri-1.6.1/ext/nokogiri/xml_libxml2_hacks.h0000644000175000017500000000040212261213762021110 0ustar boutilboutil#ifndef HAVE_XMLFIRSTELEMENTCHILD #ifndef XML_LIBXML2_HACKS #define XML_LIBXML2_HACKS xmlNodePtr xmlFirstElementChild(xmlNodePtr parent); xmlNodePtr xmlNextElementSibling(xmlNodePtr node); xmlNodePtr xmlLastElementChild(xmlNodePtr parent); #endif #endif nokogiri-1.6.1/ext/nokogiri/xml_comment.h0000644000175000017500000000022512261213762020033 0ustar boutilboutil#ifndef NOKOGIRI_XML_COMMENT #define NOKOGIRI_XML_COMMENT #include void init_xml_comment(); extern VALUE cNokogiriXmlComment; #endif nokogiri-1.6.1/ext/nokogiri/xml_entity_reference.c0000644000175000017500000000224412261213762021721 0ustar boutilboutil#include /* * call-seq: * new(document, content) * * Create a new EntityReference element on the +document+ with +name+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr xml_doc; xmlNodePtr node; VALUE document; VALUE name; VALUE rest; VALUE rb_node; rb_scan_args(argc, argv, "2*", &document, &name, &rest); Data_Get_Struct(document, xmlDoc, xml_doc); node = xmlNewReference( xml_doc, (const xmlChar *)StringValuePtr(name) ); nokogiri_root_node(node); rb_node = Nokogiri_wrap_xml_node(klass, node); rb_obj_call_init(rb_node, argc, argv); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } VALUE cNokogiriXmlEntityReference; void init_xml_entity_reference() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); /* * EntityReference represents an EntityReference node in an xml document. */ VALUE klass = rb_define_class_under(xml, "EntityReference", node); cNokogiriXmlEntityReference = klass; rb_define_singleton_method(klass, "new", new, -1); } nokogiri-1.6.1/ext/nokogiri/xml_relax_ng.c0000644000175000017500000000711412261213762020167 0ustar boutilboutil#include static void dealloc(xmlRelaxNGPtr schema) { NOKOGIRI_DEBUG_START(schema); xmlRelaxNGFree(schema); NOKOGIRI_DEBUG_END(schema); } /* * call-seq: * validate_document(document) * * Validate a Nokogiri::XML::Document against this RelaxNG schema. */ static VALUE validate_document(VALUE self, VALUE document) { xmlDocPtr doc; xmlRelaxNGPtr schema; VALUE errors; xmlRelaxNGValidCtxtPtr valid_ctxt; Data_Get_Struct(self, xmlRelaxNG, schema); Data_Get_Struct(document, xmlDoc, doc); errors = rb_ary_new(); valid_ctxt = xmlRelaxNGNewValidCtxt(schema); if(NULL == valid_ctxt) { /* we have a problem */ rb_raise(rb_eRuntimeError, "Could not create a validation context"); } #ifdef HAVE_XMLRELAXNGSETVALIDSTRUCTUREDERRORS xmlRelaxNGSetValidStructuredErrors( valid_ctxt, Nokogiri_error_array_pusher, (void *)errors ); #endif xmlRelaxNGValidateDoc(valid_ctxt, doc); xmlRelaxNGFreeValidCtxt(valid_ctxt); return errors; } /* * call-seq: * read_memory(string) * * Create a new RelaxNG from the contents of +string+ */ static VALUE read_memory(VALUE klass, VALUE content) { xmlRelaxNGParserCtxtPtr ctx = xmlRelaxNGNewMemParserCtxt( (const char *)StringValuePtr(content), (int)RSTRING_LEN(content) ); xmlRelaxNGPtr schema; VALUE errors = rb_ary_new(); VALUE rb_schema; xmlSetStructuredErrorFunc((void *)errors, Nokogiri_error_array_pusher); #ifdef HAVE_XMLRELAXNGSETPARSERSTRUCTUREDERRORS xmlRelaxNGSetParserStructuredErrors( ctx, Nokogiri_error_array_pusher, (void *)errors ); #endif schema = xmlRelaxNGParse(ctx); xmlSetStructuredErrorFunc(NULL, NULL); xmlRelaxNGFreeParserCtxt(ctx); if(NULL == schema) { xmlErrorPtr error = xmlGetLastError(); if(error) Nokogiri_error_raise(NULL, error); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } rb_schema = Data_Wrap_Struct(klass, 0, dealloc, schema); rb_iv_set(rb_schema, "@errors", errors); return rb_schema; } /* * call-seq: * from_document(doc) * * Create a new RelaxNG schema from the Nokogiri::XML::Document +doc+ */ static VALUE from_document(VALUE klass, VALUE document) { xmlDocPtr doc; xmlRelaxNGParserCtxtPtr ctx; xmlRelaxNGPtr schema; VALUE errors; VALUE rb_schema; Data_Get_Struct(document, xmlDoc, doc); /* In case someone passes us a node. ugh. */ doc = doc->doc; ctx = xmlRelaxNGNewDocParserCtxt(doc); errors = rb_ary_new(); xmlSetStructuredErrorFunc((void *)errors, Nokogiri_error_array_pusher); #ifdef HAVE_XMLRELAXNGSETPARSERSTRUCTUREDERRORS xmlRelaxNGSetParserStructuredErrors( ctx, Nokogiri_error_array_pusher, (void *)errors ); #endif schema = xmlRelaxNGParse(ctx); xmlSetStructuredErrorFunc(NULL, NULL); if(NULL == schema) { xmlErrorPtr error = xmlGetLastError(); if(error) Nokogiri_error_raise(NULL, error); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } rb_schema = Data_Wrap_Struct(klass, 0, dealloc, schema); rb_iv_set(rb_schema, "@errors", errors); return rb_schema; } VALUE cNokogiriXmlRelaxNG; void init_xml_relax_ng() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE klass = rb_define_class_under(xml, "RelaxNG", cNokogiriXmlSchema); cNokogiriXmlRelaxNG = klass; rb_define_singleton_method(klass, "read_memory", read_memory, 1); rb_define_singleton_method(klass, "from_document", from_document, 1); rb_define_private_method(klass, "validate_document", validate_document, 1); } nokogiri-1.6.1/ext/nokogiri/xml_entity_reference.h0000644000175000017500000000027012261213762021723 0ustar boutilboutil#ifndef NOKOGIRI_XML_ENTITY_REFERENCE #define NOKOGIRI_XML_ENTITY_REFERENCE #include void init_xml_entity_reference(); extern VALUE cNokogiriXmlEntityReference; #endif nokogiri-1.6.1/ext/nokogiri/html_document.c0000644000175000017500000001065012261213762020351 0ustar boutilboutil#include static ID id_encoding_found; /* * call-seq: * new * * Create a new document */ static VALUE new(int argc, VALUE *argv, VALUE klass) { VALUE uri, external_id, rest, rb_doc; htmlDocPtr doc; rb_scan_args(argc, argv, "0*", &rest); uri = rb_ary_entry(rest, (long)0); external_id = rb_ary_entry(rest, (long)1); doc = htmlNewDoc( RTEST(uri) ? (const xmlChar *)StringValuePtr(uri) : NULL, RTEST(external_id) ? (const xmlChar *)StringValuePtr(external_id) : NULL ); rb_doc = Nokogiri_wrap_xml_document(klass, doc); rb_obj_call_init(rb_doc, argc, argv); return rb_doc ; } /* * call-seq: * read_io(io, url, encoding, options) * * Read the HTML document from +io+ with given +url+, +encoding+, * and +options+. See Nokogiri::HTML.parse */ static VALUE read_io( VALUE klass, VALUE io, VALUE url, VALUE encoding, VALUE options ) { const char * c_url = NIL_P(url) ? NULL : StringValuePtr(url); const char * c_enc = NIL_P(encoding) ? NULL : StringValuePtr(encoding); VALUE error_list = rb_ary_new(); VALUE document; htmlDocPtr doc; xmlResetLastError(); xmlSetStructuredErrorFunc((void *)error_list, Nokogiri_error_array_pusher); doc = htmlReadIO( io_read_callback, io_close_callback, (void *)io, c_url, c_enc, (int)NUM2INT(options) ); xmlSetStructuredErrorFunc(NULL, NULL); /* * If EncodingFound has occurred in EncodingReader, make sure to do * a cleanup and propagate the error. */ if (rb_respond_to(io, id_encoding_found)) { VALUE encoding_found = rb_funcall(io, id_encoding_found, 0); if (!NIL_P(encoding_found)) { xmlFreeDoc(doc); rb_exc_raise(encoding_found); } } if(doc == NULL) { xmlErrorPtr error; xmlFreeDoc(doc); error = xmlGetLastError(); if(error) rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } document = Nokogiri_wrap_xml_document(klass, doc); rb_iv_set(document, "@errors", error_list); return document; } /* * call-seq: * read_memory(string, url, encoding, options) * * Read the HTML document contained in +string+ with given +url+, +encoding+, * and +options+. See Nokogiri::HTML.parse */ static VALUE read_memory( VALUE klass, VALUE string, VALUE url, VALUE encoding, VALUE options ) { const char * c_buffer = StringValuePtr(string); const char * c_url = NIL_P(url) ? NULL : StringValuePtr(url); const char * c_enc = NIL_P(encoding) ? NULL : StringValuePtr(encoding); int len = (int)RSTRING_LEN(string); VALUE error_list = rb_ary_new(); VALUE document; htmlDocPtr doc; xmlResetLastError(); xmlSetStructuredErrorFunc((void *)error_list, Nokogiri_error_array_pusher); doc = htmlReadMemory(c_buffer, len, c_url, c_enc, (int)NUM2INT(options)); xmlSetStructuredErrorFunc(NULL, NULL); if(doc == NULL) { xmlErrorPtr error; xmlFreeDoc(doc); error = xmlGetLastError(); if(error) rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } document = Nokogiri_wrap_xml_document(klass, doc); rb_iv_set(document, "@errors", error_list); return document; } /* * call-seq: * type * * The type for this document */ static VALUE type(VALUE self) { htmlDocPtr doc; Data_Get_Struct(self, xmlDoc, doc); return INT2NUM((long)doc->type); } VALUE cNokogiriHtmlDocument ; void init_html_document() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE html = rb_define_module_under(nokogiri, "HTML"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); VALUE xml_doc = rb_define_class_under(xml, "Document", node); VALUE klass = rb_define_class_under(html, "Document", xml_doc); cNokogiriHtmlDocument = klass; rb_define_singleton_method(klass, "read_memory", read_memory, 4); rb_define_singleton_method(klass, "read_io", read_io, 4); rb_define_singleton_method(klass, "new", new, -1); rb_define_method(klass, "type", type, 0); id_encoding_found = rb_intern("encoding_found"); } nokogiri-1.6.1/ext/nokogiri/xml_element_decl.h0000644000175000017500000000025012261213762021007 0ustar boutilboutil#ifndef NOKOGIRI_XML_ELEMENT_DECL #define NOKOGIRI_XML_ELEMENT_DECL #include void init_xml_element_decl(); extern VALUE cNokogiriXmlElementDecl; #endif nokogiri-1.6.1/ext/nokogiri/xml_libxml2_hacks.c0000644000175000017500000000563312261213762021116 0ustar boutilboutil#ifndef HAVE_XMLFIRSTELEMENTCHILD #include /** * xmlFirstElementChild: * @parent: the parent node * * Finds the first child node of that element which is a Element node * Note the handling of entities references is different than in * the W3C DOM element traversal spec since we don't have back reference * from entities content to entities references. * * Returns the first element child or NULL if not available */ xmlNodePtr xmlFirstElementChild(xmlNodePtr parent) { xmlNodePtr cur = NULL; if (parent == NULL) return(NULL); switch (parent->type) { case XML_ELEMENT_NODE: case XML_ENTITY_NODE: case XML_DOCUMENT_NODE: case XML_HTML_DOCUMENT_NODE: cur = parent->children; break; default: return(NULL); } while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) return(cur); cur = cur->next; } return(NULL); } /** * xmlNextElementSibling: * @node: the current node * * Finds the first closest next sibling of the node which is an * element node. * Note the handling of entities references is different than in * the W3C DOM element traversal spec since we don't have back reference * from entities content to entities references. * * Returns the next element sibling or NULL if not available */ xmlNodePtr xmlNextElementSibling(xmlNodePtr node) { if (node == NULL) return(NULL); switch (node->type) { case XML_ELEMENT_NODE: case XML_TEXT_NODE: case XML_CDATA_SECTION_NODE: case XML_ENTITY_REF_NODE: case XML_ENTITY_NODE: case XML_PI_NODE: case XML_COMMENT_NODE: case XML_DTD_NODE: case XML_XINCLUDE_START: case XML_XINCLUDE_END: node = node->next; break; default: return(NULL); } while (node != NULL) { if (node->type == XML_ELEMENT_NODE) return(node); node = node->next; } return(NULL); } /** * xmlLastElementChild: * @parent: the parent node * * Finds the last child node of that element which is a Element node * Note the handling of entities references is different than in * the W3C DOM element traversal spec since we don't have back reference * from entities content to entities references. * * Returns the last element child or NULL if not available */ xmlNodePtr xmlLastElementChild(xmlNodePtr parent) { xmlNodePtr cur = NULL; if (parent == NULL) return(NULL); switch (parent->type) { case XML_ELEMENT_NODE: case XML_ENTITY_NODE: case XML_DOCUMENT_NODE: case XML_HTML_DOCUMENT_NODE: cur = parent->last; break; default: return(NULL); } while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) return(cur); cur = cur->prev; } return(NULL); } #endif nokogiri-1.6.1/ext/nokogiri/xml_processing_instruction.h0000644000175000017500000000032012261213762023202 0ustar boutilboutil#ifndef NOKOGIRI_XML_PROCESSING_INSTRUCTION #define NOKOGIRI_XML_PROCESSING_INSTRUCTION #include void init_xml_processing_instruction(); extern VALUE cNokogiriXmlProcessingInstruction; #endif nokogiri-1.6.1/ext/nokogiri/xml_sax_parser_context.h0000644000175000017500000000030012261213762022276 0ustar boutilboutil#ifndef NOKOGIRI_XML_SAX_PARSER_CONTEXT #define NOKOGIRI_XML_SAX_PARSER_CONTEXT #include extern VALUE cNokogiriXmlSaxParserContext; void init_xml_sax_parser_context(); #endif nokogiri-1.6.1/ext/nokogiri/html_element_description.h0000644000175000017500000000031212261213762022566 0ustar boutilboutil#ifndef NOKOGIRI_HTML_ELEMENT_DESCRIPTION #define NOKOGIRI_HTML_ELEMENT_DESCRIPTION #include void init_html_element_description(); extern VALUE cNokogiriHtmlElementDescription ; #endif nokogiri-1.6.1/ext/nokogiri/xml_node_set.h0000644000175000017500000000053112261213762020171 0ustar boutilboutil#ifndef NOKOGIRI_XML_NODE_SET #define NOKOGIRI_XML_NODE_SET #include void init_xml_node_set(); extern VALUE cNokogiriXmlNodeSet ; VALUE Nokogiri_wrap_xml_node_set(xmlNodeSetPtr node_set, VALUE document) ; typedef struct _nokogiriNodeSetTuple { xmlNodeSetPtr node_set; st_table *namespaces; } nokogiriNodeSetTuple; #endif nokogiri-1.6.1/ext/nokogiri/xml_attribute_decl.h0000644000175000017500000000026012261213762021362 0ustar boutilboutil#ifndef NOKOGIRI_XML_ATTRIBUTE_DECL #define NOKOGIRI_XML_ATTRIBUTE_DECL #include void init_xml_attribute_decl(); extern VALUE cNokogiriXmlAttributeDecl; #endif nokogiri-1.6.1/ext/nokogiri/xml_node.h0000644000175000017500000000046112261213762017320 0ustar boutilboutil#ifndef NOKOGIRI_XML_NODE #define NOKOGIRI_XML_NODE #include void init_xml_node(); extern VALUE cNokogiriXmlNode ; extern VALUE cNokogiriXmlElement ; VALUE Nokogiri_wrap_xml_node(VALUE klass, xmlNodePtr node) ; void Nokogiri_xml_node_properties(xmlNodePtr node, VALUE attr_hash) ; #endif nokogiri-1.6.1/ext/nokogiri/xml_sax_parser_context.c0000644000175000017500000001142112261213762022277 0ustar boutilboutil#include VALUE cNokogiriXmlSaxParserContext ; static void deallocate(xmlParserCtxtPtr ctxt) { NOKOGIRI_DEBUG_START(handler); ctxt->sax = NULL; xmlFreeParserCtxt(ctxt); NOKOGIRI_DEBUG_END(handler); } /* * call-seq: * parse_io(io, encoding) * * Parse +io+ object with +encoding+ */ static VALUE parse_io(VALUE klass, VALUE io, VALUE encoding) { xmlParserCtxtPtr ctxt; xmlCharEncoding enc = (xmlCharEncoding)NUM2INT(encoding); ctxt = xmlCreateIOParserCtxt(NULL, NULL, (xmlInputReadCallback)io_read_callback, (xmlInputCloseCallback)io_close_callback, (void *)io, enc); if (ctxt->sax) { xmlFree(ctxt->sax); ctxt->sax = NULL; } return Data_Wrap_Struct(klass, NULL, deallocate, ctxt); } /* * call-seq: * parse_file(filename) * * Parse file given +filename+ */ static VALUE parse_file(VALUE klass, VALUE filename) { xmlParserCtxtPtr ctxt = xmlCreateFileParserCtxt(StringValuePtr(filename)); return Data_Wrap_Struct(klass, NULL, deallocate, ctxt); } /* * call-seq: * parse_memory(data) * * Parse the XML stored in memory in +data+ */ static VALUE parse_memory(VALUE klass, VALUE data) { xmlParserCtxtPtr ctxt; if (NIL_P(data)) rb_raise(rb_eArgError, "data cannot be nil"); if (!(int)RSTRING_LEN(data)) rb_raise(rb_eRuntimeError, "data cannot be empty"); ctxt = xmlCreateMemoryParserCtxt(StringValuePtr(data), (int)RSTRING_LEN(data)); if (ctxt->sax) { xmlFree(ctxt->sax); ctxt->sax = NULL; } return Data_Wrap_Struct(klass, NULL, deallocate, ctxt); } static VALUE parse_doc(VALUE ctxt_val) { xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr)ctxt_val; xmlParseDocument(ctxt); return Qnil; } static VALUE parse_doc_finalize(VALUE ctxt_val) { xmlParserCtxtPtr ctxt = (xmlParserCtxtPtr)ctxt_val; if (NULL != ctxt->myDoc) xmlFreeDoc(ctxt->myDoc); NOKOGIRI_SAX_TUPLE_DESTROY(ctxt->userData); return Qnil; } /* * call-seq: * parse_with(sax_handler) * * Use +sax_handler+ and parse the current document */ static VALUE parse_with(VALUE self, VALUE sax_handler) { xmlParserCtxtPtr ctxt; xmlSAXHandlerPtr sax; if (!rb_obj_is_kind_of(sax_handler, cNokogiriXmlSaxParser)) rb_raise(rb_eArgError, "argument must be a Nokogiri::XML::SAX::Parser"); Data_Get_Struct(self, xmlParserCtxt, ctxt); Data_Get_Struct(sax_handler, xmlSAXHandler, sax); /* Free the sax handler since we'll assign our own */ if (ctxt->sax && ctxt->sax != (xmlSAXHandlerPtr)&xmlDefaultSAXHandler) xmlFree(ctxt->sax); ctxt->sax = sax; ctxt->userData = (void *)NOKOGIRI_SAX_TUPLE_NEW(ctxt, sax_handler); rb_ensure(parse_doc, (VALUE)ctxt, parse_doc_finalize, (VALUE)ctxt); return Qnil; } /* * call-seq: * replace_entities=(boolean) * * Should this parser replace entities? & will get converted to '&' if * set to true */ static VALUE set_replace_entities(VALUE self, VALUE value) { xmlParserCtxtPtr ctxt; Data_Get_Struct(self, xmlParserCtxt, ctxt); if(Qfalse == value) ctxt->replaceEntities = 0; else ctxt->replaceEntities = 1; return value; } /* * call-seq: * replace_entities * * Should this parser replace entities? & will get converted to '&' if * set to true */ static VALUE get_replace_entities(VALUE self) { xmlParserCtxtPtr ctxt; Data_Get_Struct(self, xmlParserCtxt, ctxt); if(0 == ctxt->replaceEntities) return Qfalse; else return Qtrue; } /* * call-seq: line * * Get the current line the parser context is processing. */ static VALUE line(VALUE self) { xmlParserCtxtPtr ctxt; xmlParserInputPtr io; Data_Get_Struct(self, xmlParserCtxt, ctxt); io = ctxt->input; if(io) return INT2NUM(io->line); return Qnil; } /* * call-seq: column * * Get the current column the parser context is processing. */ static VALUE column(VALUE self) { xmlParserCtxtPtr ctxt; xmlParserInputPtr io; Data_Get_Struct(self, xmlParserCtxt, ctxt); io = ctxt->input; if(io) return INT2NUM(io->col); return Qnil; } void init_xml_sax_parser_context() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE sax = rb_define_module_under(xml, "SAX"); VALUE klass = rb_define_class_under(sax, "ParserContext", rb_cObject); cNokogiriXmlSaxParserContext = klass; rb_define_singleton_method(klass, "io", parse_io, 2); rb_define_singleton_method(klass, "memory", parse_memory, 1); rb_define_singleton_method(klass, "file", parse_file, 1); rb_define_method(klass, "parse_with", parse_with, 1); rb_define_method(klass, "replace_entities=", set_replace_entities, 1); rb_define_method(klass, "replace_entities", get_replace_entities, 0); rb_define_method(klass, "line", line, 0); rb_define_method(klass, "column", column, 0); } nokogiri-1.6.1/ext/nokogiri/xml_element_content.h0000644000175000017500000000033212261213762021553 0ustar boutilboutil#ifndef NOKOGIRI_XML_ELEMENT_CONTENT #define NOKOGIRI_XML_ELEMENT_CONTENT #include VALUE Nokogiri_wrap_element_content(VALUE doc, xmlElementContentPtr element); void init_xml_element_content(); #endif nokogiri-1.6.1/ext/nokogiri/xml_entity_decl.c0000644000175000017500000000514412261213762020674 0ustar boutilboutil#include /* * call-seq: * original_content * * Get the original_content before ref substitution */ static VALUE original_content(VALUE self) { xmlEntityPtr node; Data_Get_Struct(self, xmlEntity, node); if(!node->orig) return Qnil; return NOKOGIRI_STR_NEW2(node->orig); } /* * call-seq: * content * * Get the content */ static VALUE get_content(VALUE self) { xmlEntityPtr node; Data_Get_Struct(self, xmlEntity, node); if(!node->content) return Qnil; return NOKOGIRI_STR_NEW(node->content, node->length); } /* * call-seq: * entity_type * * Get the entity type */ static VALUE entity_type(VALUE self) { xmlEntityPtr node; Data_Get_Struct(self, xmlEntity, node); return INT2NUM((int)node->etype); } /* * call-seq: * external_id * * Get the external identifier for PUBLIC */ static VALUE external_id(VALUE self) { xmlEntityPtr node; Data_Get_Struct(self, xmlEntity, node); if(!node->ExternalID) return Qnil; return NOKOGIRI_STR_NEW2(node->ExternalID); } /* * call-seq: * system_id * * Get the URI for a SYSTEM or PUBLIC Entity */ static VALUE system_id(VALUE self) { xmlEntityPtr node; Data_Get_Struct(self, xmlEntity, node); if(!node->SystemID) return Qnil; return NOKOGIRI_STR_NEW2(node->SystemID); } VALUE cNokogiriXmlEntityDecl; void init_xml_entity_decl() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); VALUE klass = rb_define_class_under(xml, "EntityDecl", node); cNokogiriXmlEntityDecl = klass; rb_define_method(klass, "original_content", original_content, 0); rb_define_method(klass, "content", get_content, 0); rb_define_method(klass, "entity_type", entity_type, 0); rb_define_method(klass, "external_id", external_id, 0); rb_define_method(klass, "system_id", system_id, 0); rb_const_set(cNokogiriXmlEntityDecl, rb_intern("INTERNAL_GENERAL"), INT2NUM(XML_INTERNAL_GENERAL_ENTITY)); rb_const_set(cNokogiriXmlEntityDecl, rb_intern("EXTERNAL_GENERAL_PARSED"), INT2NUM(XML_EXTERNAL_GENERAL_PARSED_ENTITY)); rb_const_set(cNokogiriXmlEntityDecl, rb_intern("EXTERNAL_GENERAL_UNPARSED"), INT2NUM(XML_EXTERNAL_GENERAL_UNPARSED_ENTITY)); rb_const_set(cNokogiriXmlEntityDecl, rb_intern("INTERNAL_PARAMETER"), INT2NUM(XML_INTERNAL_PARAMETER_ENTITY)); rb_const_set(cNokogiriXmlEntityDecl, rb_intern("EXTERNAL_PARAMETER"), INT2NUM(XML_EXTERNAL_PARAMETER_ENTITY)); rb_const_set(cNokogiriXmlEntityDecl, rb_intern("INTERNAL_PREDEFINED"), INT2NUM(XML_INTERNAL_PREDEFINED_ENTITY)); } nokogiri-1.6.1/ext/nokogiri/xml_entity_decl.h0000644000175000017500000000024512261213762020676 0ustar boutilboutil#ifndef NOKOGIRI_XML_ENTITY_DECL #define NOKOGIRI_XML_ENTITY_DECL #include void init_xml_entity_decl(); extern VALUE cNokogiriXmlEntityDecl; #endif nokogiri-1.6.1/ext/nokogiri/xml_schema.c0000644000175000017500000001072112261213762017626 0ustar boutilboutil#include static void dealloc(xmlSchemaPtr schema) { NOKOGIRI_DEBUG_START(schema); xmlSchemaFree(schema); NOKOGIRI_DEBUG_END(schema); } /* * call-seq: * validate_document(document) * * Validate a Nokogiri::XML::Document against this Schema. */ static VALUE validate_document(VALUE self, VALUE document) { xmlDocPtr doc; xmlSchemaPtr schema; xmlSchemaValidCtxtPtr valid_ctxt; VALUE errors; Data_Get_Struct(self, xmlSchema, schema); Data_Get_Struct(document, xmlDoc, doc); errors = rb_ary_new(); valid_ctxt = xmlSchemaNewValidCtxt(schema); if(NULL == valid_ctxt) { /* we have a problem */ rb_raise(rb_eRuntimeError, "Could not create a validation context"); } #ifdef HAVE_XMLSCHEMASETVALIDSTRUCTUREDERRORS xmlSchemaSetValidStructuredErrors( valid_ctxt, Nokogiri_error_array_pusher, (void *)errors ); #endif xmlSchemaValidateDoc(valid_ctxt, doc); xmlSchemaFreeValidCtxt(valid_ctxt); return errors; } /* * call-seq: * validate_file(filename) * * Validate a file against this Schema. */ static VALUE validate_file(VALUE self, VALUE rb_filename) { xmlSchemaPtr schema; xmlSchemaValidCtxtPtr valid_ctxt; const char *filename ; VALUE errors; Data_Get_Struct(self, xmlSchema, schema); filename = (const char*)StringValuePtr(rb_filename) ; errors = rb_ary_new(); valid_ctxt = xmlSchemaNewValidCtxt(schema); if(NULL == valid_ctxt) { /* we have a problem */ rb_raise(rb_eRuntimeError, "Could not create a validation context"); } #ifdef HAVE_XMLSCHEMASETVALIDSTRUCTUREDERRORS xmlSchemaSetValidStructuredErrors( valid_ctxt, Nokogiri_error_array_pusher, (void *)errors ); #endif xmlSchemaValidateFile(valid_ctxt, filename, 0); xmlSchemaFreeValidCtxt(valid_ctxt); return errors; } /* * call-seq: * read_memory(string) * * Create a new Schema from the contents of +string+ */ static VALUE read_memory(VALUE klass, VALUE content) { xmlSchemaPtr schema; xmlSchemaParserCtxtPtr ctx = xmlSchemaNewMemParserCtxt( (const char *)StringValuePtr(content), (int)RSTRING_LEN(content) ); VALUE rb_schema; VALUE errors = rb_ary_new(); xmlSetStructuredErrorFunc((void *)errors, Nokogiri_error_array_pusher); #ifdef HAVE_XMLSCHEMASETPARSERSTRUCTUREDERRORS xmlSchemaSetParserStructuredErrors( ctx, Nokogiri_error_array_pusher, (void *)errors ); #endif schema = xmlSchemaParse(ctx); xmlSetStructuredErrorFunc(NULL, NULL); xmlSchemaFreeParserCtxt(ctx); if(NULL == schema) { xmlErrorPtr error = xmlGetLastError(); if(error) Nokogiri_error_raise(NULL, error); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } rb_schema = Data_Wrap_Struct(klass, 0, dealloc, schema); rb_iv_set(rb_schema, "@errors", errors); return rb_schema; } /* * call-seq: * from_document(doc) * * Create a new Schema from the Nokogiri::XML::Document +doc+ */ static VALUE from_document(VALUE klass, VALUE document) { xmlDocPtr doc; xmlSchemaParserCtxtPtr ctx; xmlSchemaPtr schema; VALUE errors; VALUE rb_schema; Data_Get_Struct(document, xmlDoc, doc); /* In case someone passes us a node. ugh. */ doc = doc->doc; ctx = xmlSchemaNewDocParserCtxt(doc); errors = rb_ary_new(); xmlSetStructuredErrorFunc((void *)errors, Nokogiri_error_array_pusher); #ifdef HAVE_XMLSCHEMASETPARSERSTRUCTUREDERRORS xmlSchemaSetParserStructuredErrors( ctx, Nokogiri_error_array_pusher, (void *)errors ); #endif schema = xmlSchemaParse(ctx); xmlSetStructuredErrorFunc(NULL, NULL); xmlSchemaFreeParserCtxt(ctx); if(NULL == schema) { xmlErrorPtr error = xmlGetLastError(); if(error) Nokogiri_error_raise(NULL, error); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } rb_schema = Data_Wrap_Struct(klass, 0, dealloc, schema); rb_iv_set(rb_schema, "@errors", errors); return rb_schema; return Qnil; } VALUE cNokogiriXmlSchema; void init_xml_schema() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE klass = rb_define_class_under(xml, "Schema", rb_cObject); cNokogiriXmlSchema = klass; rb_define_singleton_method(klass, "read_memory", read_memory, 1); rb_define_singleton_method(klass, "from_document", from_document, 1); rb_define_private_method(klass, "validate_document", validate_document, 1); rb_define_private_method(klass, "validate_file", validate_file, 1); } nokogiri-1.6.1/ext/nokogiri/xml_namespace.h0000644000175000017500000000044612261213762020332 0ustar boutilboutil#ifndef NOKOGIRI_XML_NAMESPACE #define NOKOGIRI_XML_NAMESPACE #include void init_xml_namespace(); extern VALUE cNokogiriXmlNamespace ; VALUE Nokogiri_wrap_xml_namespace(xmlDocPtr doc, xmlNsPtr node) ; VALUE Nokogiri_wrap_xml_namespace2(VALUE document, xmlNsPtr node) ; #endif nokogiri-1.6.1/ext/nokogiri/xml_sax_parser.c0000644000175000017500000002044212261213762020536 0ustar boutilboutil#include int vasprintf (char **strp, const char *fmt, va_list ap); void vasprintf_free (void *p); static ID id_start_document, id_end_document, id_start_element, id_end_element; static ID id_start_element_namespace, id_end_element_namespace; static ID id_comment, id_characters, id_xmldecl, id_error, id_warning; static ID id_cdata_block, id_cAttribute; static ID id_processing_instruction; #define STRING_OR_NULL(str) \ (RTEST(str) ? StringValuePtr(str) : NULL) static void start_document(void * ctx) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); xmlParserCtxtPtr ctxt = NOKOGIRI_SAX_CTXT(ctx); if(NULL != ctxt && ctxt->html != 1) { if(ctxt->standalone != -1) { /* -1 means there was no declaration */ VALUE encoding = ctxt->encoding ? NOKOGIRI_STR_NEW2(ctxt->encoding) : Qnil; VALUE version = ctxt->version ? NOKOGIRI_STR_NEW2(ctxt->version) : Qnil; VALUE standalone = Qnil; switch(ctxt->standalone) { case 0: standalone = NOKOGIRI_STR_NEW2("no"); break; case 1: standalone = NOKOGIRI_STR_NEW2("yes"); break; } rb_funcall(doc, id_xmldecl, 3, version, encoding, standalone); } } rb_funcall(doc, id_start_document, 0); } static void end_document(void * ctx) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); rb_funcall(doc, id_end_document, 0); } static void start_element(void * ctx, const xmlChar *name, const xmlChar **atts) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); VALUE attributes = rb_ary_new(); const xmlChar * attr; int i = 0; if(atts) { while((attr = atts[i]) != NULL) { const xmlChar * val = atts[i+1]; VALUE value = val != NULL ? NOKOGIRI_STR_NEW2(val) : Qnil; rb_ary_push(attributes, rb_ary_new3(2, NOKOGIRI_STR_NEW2(attr), value)); i+=2; } } rb_funcall( doc, id_start_element, 2, NOKOGIRI_STR_NEW2(name), attributes ); } static void end_element(void * ctx, const xmlChar *name) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); rb_funcall(doc, id_end_element, 1, NOKOGIRI_STR_NEW2(name)); } static VALUE attributes_as_list( VALUE self, int nb_attributes, const xmlChar ** attributes) { VALUE list = rb_ary_new2((long)nb_attributes); VALUE attr_klass = rb_const_get(cNokogiriXmlSaxParser, id_cAttribute); if (attributes) { /* Each attribute is an array of [localname, prefix, URI, value, end] */ int i; for (i = 0; i < nb_attributes * 5; i += 5) { VALUE argv[4], attribute; argv[0] = RBSTR_OR_QNIL(attributes[i + 0]); /* localname */ argv[1] = RBSTR_OR_QNIL(attributes[i + 1]); /* prefix */ argv[2] = RBSTR_OR_QNIL(attributes[i + 2]); /* URI */ /* value */ argv[3] = NOKOGIRI_STR_NEW((const char*)attributes[i+3], (attributes[i+4] - attributes[i+3])); attribute = rb_class_new_instance(4, argv, attr_klass); rb_ary_push(list, attribute); } } return list; } static void start_element_ns ( void * ctx, const xmlChar * localname, const xmlChar * prefix, const xmlChar * uri, int nb_namespaces, const xmlChar ** namespaces, int nb_attributes, int nb_defaulted, const xmlChar ** attributes) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); VALUE attribute_list = attributes_as_list(self, nb_attributes, attributes); VALUE ns_list = rb_ary_new2((long)nb_namespaces); if (namespaces) { int i; for (i = 0; i < nb_namespaces * 2; i += 2) { rb_ary_push(ns_list, rb_ary_new3((long)2, RBSTR_OR_QNIL(namespaces[i + 0]), RBSTR_OR_QNIL(namespaces[i + 1]) ) ); } } rb_funcall( doc, id_start_element_namespace, 5, NOKOGIRI_STR_NEW2(localname), attribute_list, RBSTR_OR_QNIL(prefix), RBSTR_OR_QNIL(uri), ns_list ); } /** * end_element_ns was borrowed heavily from libxml-ruby. */ static void end_element_ns ( void * ctx, const xmlChar * localname, const xmlChar * prefix, const xmlChar * uri) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); rb_funcall(doc, id_end_element_namespace, 3, NOKOGIRI_STR_NEW2(localname), RBSTR_OR_QNIL(prefix), RBSTR_OR_QNIL(uri) ); } static void characters_func(void * ctx, const xmlChar * ch, int len) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); VALUE str = NOKOGIRI_STR_NEW(ch, len); rb_funcall(doc, id_characters, 1, str); } static void comment_func(void * ctx, const xmlChar * value) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); VALUE str = NOKOGIRI_STR_NEW2(value); rb_funcall(doc, id_comment, 1, str); } static void warning_func(void * ctx, const char *msg, ...) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); char * message; VALUE ruby_message; va_list args; va_start(args, msg); vasprintf(&message, msg, args); va_end(args); ruby_message = NOKOGIRI_STR_NEW2(message); vasprintf_free(message); rb_funcall(doc, id_warning, 1, ruby_message); } static void error_func(void * ctx, const char *msg, ...) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); char * message; VALUE ruby_message; va_list args; va_start(args, msg); vasprintf(&message, msg, args); va_end(args); ruby_message = NOKOGIRI_STR_NEW2(message); vasprintf_free(message); rb_funcall(doc, id_error, 1, ruby_message); } static void cdata_block(void * ctx, const xmlChar * value, int len) { VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); VALUE string = NOKOGIRI_STR_NEW(value, len); rb_funcall(doc, id_cdata_block, 1, string); } static void processing_instruction(void * ctx, const xmlChar * name, const xmlChar * content) { VALUE rb_content; VALUE self = NOKOGIRI_SAX_SELF(ctx); VALUE doc = rb_iv_get(self, "@document"); rb_content = content ? NOKOGIRI_STR_NEW2(content) : Qnil; rb_funcall( doc, id_processing_instruction, 2, NOKOGIRI_STR_NEW2(name), rb_content ); } static void deallocate(xmlSAXHandlerPtr handler) { NOKOGIRI_DEBUG_START(handler); free(handler); NOKOGIRI_DEBUG_END(handler); } static VALUE allocate(VALUE klass) { xmlSAXHandlerPtr handler = calloc((size_t)1, sizeof(xmlSAXHandler)); xmlSetStructuredErrorFunc(NULL, NULL); handler->startDocument = start_document; handler->endDocument = end_document; handler->startElement = start_element; handler->endElement = end_element; handler->startElementNs = start_element_ns; handler->endElementNs = end_element_ns; handler->characters = characters_func; handler->comment = comment_func; handler->warning = warning_func; handler->error = error_func; handler->cdataBlock = cdata_block; handler->processingInstruction = processing_instruction; handler->initialized = XML_SAX2_MAGIC; return Data_Wrap_Struct(klass, NULL, deallocate, handler); } VALUE cNokogiriXmlSaxParser ; void init_xml_sax_parser() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE sax = rb_define_module_under(xml, "SAX"); VALUE klass = rb_define_class_under(sax, "Parser", rb_cObject); cNokogiriXmlSaxParser = klass; rb_define_alloc_func(klass, allocate); id_start_document = rb_intern("start_document"); id_end_document = rb_intern("end_document"); id_start_element = rb_intern("start_element"); id_end_element = rb_intern("end_element"); id_comment = rb_intern("comment"); id_characters = rb_intern("characters"); id_xmldecl = rb_intern("xmldecl"); id_error = rb_intern("error"); id_warning = rb_intern("warning"); id_cdata_block = rb_intern("cdata_block"); id_cAttribute = rb_intern("Attribute"); id_start_element_namespace = rb_intern("start_element_namespace"); id_end_element_namespace = rb_intern("end_element_namespace"); id_processing_instruction = rb_intern("processing_instruction"); } nokogiri-1.6.1/ext/nokogiri/xml_text.h0000644000175000017500000000021212261213762017351 0ustar boutilboutil#ifndef NOKOGIRI_XML_TEXT #define NOKOGIRI_XML_TEXT #include void init_xml_text(); extern VALUE cNokogiriXmlText ; #endif nokogiri-1.6.1/ext/nokogiri/xml_dtd.c0000644000175000017500000001003112261213762017133 0ustar boutilboutil#include static void notation_copier(void *payload, void *data, xmlChar *name) { VALUE hash = (VALUE)data; VALUE klass = rb_const_get(mNokogiriXml, rb_intern("Notation")); xmlNotationPtr c_notation = (xmlNotationPtr)payload; VALUE notation; VALUE argv[3]; argv[0] = (c_notation->name ? NOKOGIRI_STR_NEW2(c_notation->name) : Qnil); argv[1] = (c_notation->PublicID ? NOKOGIRI_STR_NEW2(c_notation->PublicID) : Qnil); argv[2] = (c_notation->SystemID ? NOKOGIRI_STR_NEW2(c_notation->SystemID) : Qnil); notation = rb_class_new_instance(3, argv, klass); rb_hash_aset(hash, NOKOGIRI_STR_NEW2(name),notation); } static void element_copier(void *_payload, void *data, xmlChar *name) { VALUE hash = (VALUE)data; xmlNodePtr payload = (xmlNodePtr)_payload; VALUE element = Nokogiri_wrap_xml_node(Qnil, payload); rb_hash_aset(hash, NOKOGIRI_STR_NEW2(name), element); } /* * call-seq: * entities * * Get a hash of the elements for this DTD. */ static VALUE entities(VALUE self) { xmlDtdPtr dtd; VALUE hash; Data_Get_Struct(self, xmlDtd, dtd); if(!dtd->entities) return Qnil; hash = rb_hash_new(); xmlHashScan((xmlHashTablePtr)dtd->entities, element_copier, (void *)hash); return hash; } /* * call-seq: * notations * * Get a hash of the notations for this DTD. */ static VALUE notations(VALUE self) { xmlDtdPtr dtd; VALUE hash; Data_Get_Struct(self, xmlDtd, dtd); if(!dtd->notations) return Qnil; hash = rb_hash_new(); xmlHashScan((xmlHashTablePtr)dtd->notations, notation_copier, (void *)hash); return hash; } /* * call-seq: * attributes * * Get a hash of the attributes for this DTD. */ static VALUE attributes(VALUE self) { xmlDtdPtr dtd; VALUE hash; Data_Get_Struct(self, xmlDtd, dtd); hash = rb_hash_new(); if(!dtd->attributes) return hash; xmlHashScan((xmlHashTablePtr)dtd->attributes, element_copier, (void *)hash); return hash; } /* * call-seq: * elements * * Get a hash of the elements for this DTD. */ static VALUE elements(VALUE self) { xmlDtdPtr dtd; VALUE hash; Data_Get_Struct(self, xmlDtd, dtd); if(!dtd->elements) return Qnil; hash = rb_hash_new(); xmlHashScan((xmlHashTablePtr)dtd->elements, element_copier, (void *)hash); return hash; } /* * call-seq: * validate(document) * * Validate +document+ returning a list of errors */ static VALUE validate(VALUE self, VALUE document) { xmlDocPtr doc; xmlDtdPtr dtd; xmlValidCtxtPtr ctxt; VALUE error_list; Data_Get_Struct(self, xmlDtd, dtd); Data_Get_Struct(document, xmlDoc, doc); error_list = rb_ary_new(); ctxt = xmlNewValidCtxt(); xmlSetStructuredErrorFunc((void *)error_list, Nokogiri_error_array_pusher); xmlValidateDtd(ctxt, doc, dtd); xmlSetStructuredErrorFunc(NULL, NULL); xmlFreeValidCtxt(ctxt); return error_list; } /* * call-seq: * system_id * * Get the System ID for this DTD */ static VALUE system_id(VALUE self) { xmlDtdPtr dtd; Data_Get_Struct(self, xmlDtd, dtd); if(!dtd->SystemID) return Qnil; return NOKOGIRI_STR_NEW2(dtd->SystemID); } /* * call-seq: * external_id * * Get the External ID for this DTD */ static VALUE external_id(VALUE self) { xmlDtdPtr dtd; Data_Get_Struct(self, xmlDtd, dtd); if(!dtd->ExternalID) return Qnil; return NOKOGIRI_STR_NEW2(dtd->ExternalID); } VALUE cNokogiriXmlDtd; void init_xml_dtd() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); /* * Nokogiri::XML::DTD wraps DTD nodes in an XML document */ VALUE klass = rb_define_class_under(xml, "DTD", node); cNokogiriXmlDtd = klass; rb_define_method(klass, "notations", notations, 0); rb_define_method(klass, "elements", elements, 0); rb_define_method(klass, "entities", entities, 0); rb_define_method(klass, "validate", validate, 1); rb_define_method(klass, "attributes", attributes, 0); rb_define_method(klass, "system_id", system_id, 0); rb_define_method(klass, "external_id", external_id, 0); } nokogiri-1.6.1/ext/nokogiri/xml_io.h0000644000175000017500000000040112261213762016774 0ustar boutilboutil#ifndef NOKOGIRI_XML_IO #define NOKOGIRI_XML_IO #include int io_read_callback(void * ctx, char * buffer, int len); int io_write_callback(void * ctx, char * buffer, int len); int io_close_callback(void * ctx); void init_nokogiri_io(); #endif nokogiri-1.6.1/ext/nokogiri/xml_namespace.c0000644000175000017500000000300212261213762020314 0ustar boutilboutil#include VALUE cNokogiriXmlNamespace ; /* * call-seq: * prefix * * Get the prefix for this namespace. Returns +nil+ if there is no prefix. */ static VALUE prefix(VALUE self) { xmlNsPtr ns; Data_Get_Struct(self, xmlNs, ns); if(!ns->prefix) return Qnil; return NOKOGIRI_STR_NEW2(ns->prefix); } /* * call-seq: * href * * Get the href for this namespace */ static VALUE href(VALUE self) { xmlNsPtr ns; Data_Get_Struct(self, xmlNs, ns); if(!ns->href) return Qnil; return NOKOGIRI_STR_NEW2(ns->href); } VALUE Nokogiri_wrap_xml_namespace(xmlDocPtr doc, xmlNsPtr node) { VALUE ns, document, node_cache; assert(doc->_private); if(node->_private) return (VALUE)node->_private; ns = Data_Wrap_Struct(cNokogiriXmlNamespace, 0, 0, node); document = DOC_RUBY_OBJECT(doc); node_cache = rb_iv_get(document, "@node_cache"); rb_ary_push(node_cache, ns); rb_iv_set(ns, "@document", DOC_RUBY_OBJECT(doc)); node->_private = (void *)ns; return ns; } VALUE Nokogiri_wrap_xml_namespace2(VALUE document, xmlNsPtr node) { xmlDocPtr doc; Data_Get_Struct(document, xmlDoc, doc) ; return Nokogiri_wrap_xml_namespace(doc, node); } void init_xml_namespace() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE klass = rb_define_class_under(xml, "Namespace", rb_cObject); cNokogiriXmlNamespace = klass; rb_define_method(klass, "prefix", prefix, 0); rb_define_method(klass, "href", href, 0); } nokogiri-1.6.1/ext/nokogiri/xml_cdata.c0000644000175000017500000000245012261213762017442 0ustar boutilboutil#include /* * call-seq: * new(document, content) * * Create a new CDATA element on the +document+ with +content+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr xml_doc; xmlNodePtr node; VALUE doc; VALUE content; VALUE rest; VALUE rb_node; rb_scan_args(argc, argv, "2*", &doc, &content, &rest); Data_Get_Struct(doc, xmlDoc, xml_doc); node = xmlNewCDataBlock( xml_doc->doc, NIL_P(content) ? NULL : (const xmlChar *)StringValuePtr(content), NIL_P(content) ? 0 : (int)RSTRING_LEN(content) ); nokogiri_root_node(node); rb_node = Nokogiri_wrap_xml_node(klass, node); rb_obj_call_init(rb_node, argc, argv); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } VALUE cNokogiriXmlCData; void init_xml_cdata() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); VALUE char_data = rb_define_class_under(xml, "CharacterData", node); VALUE text = rb_define_class_under(xml, "Text", char_data); /* * CData represents a CData node in an xml document. */ VALUE klass = rb_define_class_under(xml, "CDATA", text); cNokogiriXmlCData = klass; rb_define_singleton_method(klass, "new", new, -1); } nokogiri-1.6.1/ext/nokogiri/html_sax_parser_context.c0000644000175000017500000000600312261213762022443 0ustar boutilboutil#include VALUE cNokogiriHtmlSaxParserContext ; static void deallocate(xmlParserCtxtPtr ctxt) { NOKOGIRI_DEBUG_START(handler); ctxt->sax = NULL; htmlFreeParserCtxt(ctxt); NOKOGIRI_DEBUG_END(handler); } static VALUE parse_memory(VALUE klass, VALUE data, VALUE encoding) { htmlParserCtxtPtr ctxt; if (NIL_P(data)) rb_raise(rb_eArgError, "data cannot be nil"); if (!(int)RSTRING_LEN(data)) rb_raise(rb_eRuntimeError, "data cannot be empty"); ctxt = htmlCreateMemoryParserCtxt(StringValuePtr(data), (int)RSTRING_LEN(data)); if (ctxt->sax) { xmlFree(ctxt->sax); ctxt->sax = NULL; } if (RTEST(encoding)) { xmlCharEncodingHandlerPtr enc = xmlFindCharEncodingHandler(StringValuePtr(encoding)); if (enc != NULL) { xmlSwitchToEncoding(ctxt, enc); if (ctxt->errNo == XML_ERR_UNSUPPORTED_ENCODING) { rb_raise(rb_eRuntimeError, "Unsupported encoding %s", StringValuePtr(encoding)); } } } return Data_Wrap_Struct(klass, NULL, deallocate, ctxt); } static VALUE parse_file(VALUE klass, VALUE filename, VALUE encoding) { htmlParserCtxtPtr ctxt = htmlCreateFileParserCtxt( StringValuePtr(filename), StringValuePtr(encoding) ); return Data_Wrap_Struct(klass, NULL, deallocate, ctxt); } static VALUE parse_doc(VALUE ctxt_val) { htmlParserCtxtPtr ctxt = (htmlParserCtxtPtr)ctxt_val; htmlParseDocument(ctxt); return Qnil; } static VALUE parse_doc_finalize(VALUE ctxt_val) { htmlParserCtxtPtr ctxt = (htmlParserCtxtPtr)ctxt_val; if (ctxt->myDoc) xmlFreeDoc(ctxt->myDoc); NOKOGIRI_SAX_TUPLE_DESTROY(ctxt->userData); return Qnil; } static VALUE parse_with(VALUE self, VALUE sax_handler) { htmlParserCtxtPtr ctxt; htmlSAXHandlerPtr sax; if (!rb_obj_is_kind_of(sax_handler, cNokogiriXmlSaxParser)) rb_raise(rb_eArgError, "argument must be a Nokogiri::XML::SAX::Parser"); Data_Get_Struct(self, htmlParserCtxt, ctxt); Data_Get_Struct(sax_handler, htmlSAXHandler, sax); /* Free the sax handler since we'll assign our own */ if (ctxt->sax && ctxt->sax != (xmlSAXHandlerPtr)&xmlDefaultSAXHandler) xmlFree(ctxt->sax); ctxt->sax = sax; ctxt->userData = (void *)NOKOGIRI_SAX_TUPLE_NEW(ctxt, sax_handler); rb_ensure(parse_doc, (VALUE)ctxt, parse_doc_finalize, (VALUE)ctxt); return self; } void init_html_sax_parser_context() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE html = rb_define_module_under(nokogiri, "HTML"); VALUE sax = rb_define_module_under(xml, "SAX"); VALUE hsax = rb_define_module_under(html, "SAX"); VALUE pc = rb_define_class_under(sax, "ParserContext", rb_cObject); VALUE klass = rb_define_class_under(hsax, "ParserContext", pc); cNokogiriHtmlSaxParserContext = klass; rb_define_singleton_method(klass, "memory", parse_memory, 2); rb_define_singleton_method(klass, "file", parse_file, 2); rb_define_method(klass, "parse_with", parse_with, 1); } nokogiri-1.6.1/ext/nokogiri/xml_sax_parser.h0000644000175000017500000000151112261213762020537 0ustar boutilboutil#ifndef NOKOGIRI_XML_SAX_PARSER #define NOKOGIRI_XML_SAX_PARSER #include void init_xml_sax_parser(); extern VALUE cNokogiriXmlSaxParser ; typedef struct _nokogiriSAXTuple { xmlParserCtxtPtr ctxt; VALUE self; } nokogiriSAXTuple; typedef nokogiriSAXTuple * nokogiriSAXTuplePtr; #define NOKOGIRI_SAX_SELF(_ctxt) \ ((nokogiriSAXTuplePtr)(_ctxt))->self #define NOKOGIRI_SAX_CTXT(_ctxt) \ ((nokogiriSAXTuplePtr)(_ctxt))->ctxt #define NOKOGIRI_SAX_TUPLE_NEW(_ctxt, _self) \ nokogiri_sax_tuple_new(_ctxt, _self) static inline nokogiriSAXTuplePtr nokogiri_sax_tuple_new(xmlParserCtxtPtr ctxt, VALUE self) { nokogiriSAXTuplePtr tuple = malloc(sizeof(nokogiriSAXTuple)); tuple->self = self; tuple->ctxt = ctxt; return tuple; } #define NOKOGIRI_SAX_TUPLE_DESTROY(_tuple) \ free(_tuple) \ #endif nokogiri-1.6.1/ext/nokogiri/xml_xpath_context.h0000644000175000017500000000046712261213762021271 0ustar boutilboutil#ifndef NOKOGIRI_XML_XPATH_CONTEXT #define NOKOGIRI_XML_XPATH_CONTEXT #include void init_xml_xpath_context(); void Nokogiri_marshal_xpath_funcall_and_return_values(xmlXPathParserContextPtr ctx, int nargs, VALUE handler, const char* function_name) ; extern VALUE cNokogiriXmlXpathContext; #endif nokogiri-1.6.1/ext/nokogiri/xml_reader.c0000644000175000017500000003522312261213762017634 0ustar boutilboutil#include static void dealloc(xmlTextReaderPtr reader) { NOKOGIRI_DEBUG_START(reader); xmlFreeTextReader(reader); NOKOGIRI_DEBUG_END(reader); } static int has_attributes(xmlTextReaderPtr reader) { /* * this implementation of xmlTextReaderHasAttributes explicitly includes * namespaces and properties, because some earlier versions ignore * namespaces. */ xmlNodePtr node ; node = xmlTextReaderCurrentNode(reader); if (node == NULL) return(0); if ((node->type == XML_ELEMENT_NODE) && ((node->properties != NULL) || (node->nsDef != NULL))) return(1); return(0); } static void Nokogiri_xml_node_namespaces(xmlNodePtr node, VALUE attr_hash) { xmlNsPtr ns; static char buffer[XMLNS_BUFFER_LEN] ; char *key ; size_t keylen ; if (node->type != XML_ELEMENT_NODE) return ; ns = node->nsDef; while (ns != NULL) { keylen = XMLNS_PREFIX_LEN + (ns->prefix ? (strlen((const char*)ns->prefix) + 1) : 0) ; if (keylen > XMLNS_BUFFER_LEN) { key = (char*)malloc(keylen) ; } else { key = buffer ; } if (ns->prefix) { sprintf(key, "%s:%s", XMLNS_PREFIX, ns->prefix); } else { sprintf(key, "%s", XMLNS_PREFIX); } rb_hash_aset(attr_hash, NOKOGIRI_STR_NEW2(key), (ns->href ? NOKOGIRI_STR_NEW2(ns->href) : Qnil) ); if (key != buffer) { free(key); } ns = ns->next ; } } /* * call-seq: * default? * * Was an attribute generated from the default value in the DTD or schema? */ static VALUE default_eh(VALUE self) { xmlTextReaderPtr reader; int eh; Data_Get_Struct(self, xmlTextReader, reader); eh = xmlTextReaderIsDefault(reader); if(eh == 0) return Qfalse; if(eh == 1) return Qtrue; return Qnil; } /* * call-seq: * value? * * Does this node have a text value? */ static VALUE value_eh(VALUE self) { xmlTextReaderPtr reader; int eh; Data_Get_Struct(self, xmlTextReader, reader); eh = xmlTextReaderHasValue(reader); if(eh == 0) return Qfalse; if(eh == 1) return Qtrue; return Qnil; } /* * call-seq: * attributes? * * Does this node have attributes? */ static VALUE attributes_eh(VALUE self) { xmlTextReaderPtr reader; int eh; Data_Get_Struct(self, xmlTextReader, reader); eh = has_attributes(reader); if(eh == 0) return Qfalse; if(eh == 1) return Qtrue; return Qnil; } /* * call-seq: * namespaces * * Get a hash of namespaces for this Node */ static VALUE namespaces(VALUE self) { xmlTextReaderPtr reader; xmlNodePtr ptr; VALUE attr ; Data_Get_Struct(self, xmlTextReader, reader); attr = rb_hash_new() ; if (! has_attributes(reader)) return attr ; ptr = xmlTextReaderExpand(reader); if(ptr == NULL) return Qnil; Nokogiri_xml_node_namespaces(ptr, attr); return attr ; } /* * call-seq: * attribute_nodes * * Get a list of attributes for this Node */ static VALUE attribute_nodes(VALUE self) { xmlTextReaderPtr reader; xmlNodePtr ptr; VALUE attr ; Data_Get_Struct(self, xmlTextReader, reader); attr = rb_ary_new() ; if (! has_attributes(reader)) return attr ; ptr = xmlTextReaderExpand(reader); if(ptr == NULL) return Qnil; Nokogiri_xml_node_properties(ptr, attr); return attr ; } /* * call-seq: * attribute_at(index) * * Get the value of attribute at +index+ */ static VALUE attribute_at(VALUE self, VALUE index) { xmlTextReaderPtr reader; xmlChar *value; VALUE rb_value; Data_Get_Struct(self, xmlTextReader, reader); if(NIL_P(index)) return Qnil; index = rb_Integer(index); value = xmlTextReaderGetAttributeNo( reader, (int)NUM2INT(index) ); if(value == NULL) return Qnil; rb_value = NOKOGIRI_STR_NEW2(value); xmlFree(value); return rb_value; } /* * call-seq: * attribute(name) * * Get the value of attribute named +name+ */ static VALUE reader_attribute(VALUE self, VALUE name) { xmlTextReaderPtr reader; xmlChar *value ; VALUE rb_value; Data_Get_Struct(self, xmlTextReader, reader); if(NIL_P(name)) return Qnil; name = StringValue(name) ; value = xmlTextReaderGetAttribute(reader, (xmlChar*)StringValuePtr(name)); if(value == NULL) { /* this section is an attempt to workaround older versions of libxml that don't handle namespaces properly in all attribute-and-friends functions */ xmlChar *prefix = NULL ; xmlChar *localname = xmlSplitQName2((xmlChar*)StringValuePtr(name), &prefix); if (localname != NULL) { value = xmlTextReaderLookupNamespace(reader, localname); xmlFree(localname) ; } else { value = xmlTextReaderLookupNamespace(reader, prefix); } xmlFree(prefix); } if(value == NULL) return Qnil; rb_value = NOKOGIRI_STR_NEW2(value); xmlFree(value); return rb_value; } /* * call-seq: * attribute_count * * Get the number of attributes for the current node */ static VALUE attribute_count(VALUE self) { xmlTextReaderPtr reader; int count; Data_Get_Struct(self, xmlTextReader, reader); count = xmlTextReaderAttributeCount(reader); if(count == -1) return Qnil; return INT2NUM((long)count); } /* * call-seq: * depth * * Get the depth of the node */ static VALUE depth(VALUE self) { xmlTextReaderPtr reader; int depth; Data_Get_Struct(self, xmlTextReader, reader); depth = xmlTextReaderDepth(reader); if(depth == -1) return Qnil; return INT2NUM((long)depth); } /* * call-seq: * xml_version * * Get the XML version of the document being read */ static VALUE xml_version(VALUE self) { xmlTextReaderPtr reader; const char *version; Data_Get_Struct(self, xmlTextReader, reader); version = (const char *)xmlTextReaderConstXmlVersion(reader); if(version == NULL) return Qnil; return NOKOGIRI_STR_NEW2(version); } /* * call-seq: * lang * * Get the xml:lang scope within which the node resides. */ static VALUE lang(VALUE self) { xmlTextReaderPtr reader; const char *lang; Data_Get_Struct(self, xmlTextReader, reader); lang = (const char *)xmlTextReaderConstXmlLang(reader); if(lang == NULL) return Qnil; return NOKOGIRI_STR_NEW2(lang); } /* * call-seq: * value * * Get the text value of the node if present. Returns a utf-8 encoded string. */ static VALUE value(VALUE self) { xmlTextReaderPtr reader; const char *value; Data_Get_Struct(self, xmlTextReader, reader); value = (const char *)xmlTextReaderConstValue(reader); if(value == NULL) return Qnil; return NOKOGIRI_STR_NEW2(value); } /* * call-seq: * prefix * * Get the shorthand reference to the namespace associated with the node. */ static VALUE prefix(VALUE self) { xmlTextReaderPtr reader; const char *prefix; Data_Get_Struct(self, xmlTextReader, reader); prefix = (const char *)xmlTextReaderConstPrefix(reader); if(prefix == NULL) return Qnil; return NOKOGIRI_STR_NEW2(prefix); } /* * call-seq: * namespace_uri * * Get the URI defining the namespace associated with the node */ static VALUE namespace_uri(VALUE self) { xmlTextReaderPtr reader; const char *uri; Data_Get_Struct(self, xmlTextReader, reader); uri = (const char *)xmlTextReaderConstNamespaceUri(reader); if(uri == NULL) return Qnil; return NOKOGIRI_STR_NEW2(uri); } /* * call-seq: * local_name * * Get the local name of the node */ static VALUE local_name(VALUE self) { xmlTextReaderPtr reader; const char *name; Data_Get_Struct(self, xmlTextReader, reader); name = (const char *)xmlTextReaderConstLocalName(reader); if(name == NULL) return Qnil; return NOKOGIRI_STR_NEW2(name); } /* * call-seq: * name * * Get the name of the node. Returns a utf-8 encoded string. */ static VALUE name(VALUE self) { xmlTextReaderPtr reader; const char *name; Data_Get_Struct(self, xmlTextReader, reader); name = (const char *)xmlTextReaderConstName(reader); if(name == NULL) return Qnil; return NOKOGIRI_STR_NEW2(name); } /* * call-seq: * base_uri * * Get the xml:base of the node */ static VALUE base_uri(VALUE self) { xmlTextReaderPtr reader; const char * base_uri; Data_Get_Struct(self, xmlTextReader, reader); base_uri = (const char *)xmlTextReaderBaseUri(reader); if (base_uri == NULL) return Qnil; return NOKOGIRI_STR_NEW2(base_uri); } /* * call-seq: * state * * Get the state of the reader */ static VALUE state(VALUE self) { xmlTextReaderPtr reader; Data_Get_Struct(self, xmlTextReader, reader); return INT2NUM((long)xmlTextReaderReadState(reader)); } /* * call-seq: * node_type * * Get the type of readers current node */ static VALUE node_type(VALUE self) { xmlTextReaderPtr reader; Data_Get_Struct(self, xmlTextReader, reader); return INT2NUM((long)xmlTextReaderNodeType(reader)); } /* * call-seq: * read * * Move the Reader forward through the XML document. */ static VALUE read_more(VALUE self) { xmlTextReaderPtr reader; xmlErrorPtr error; VALUE error_list; int ret; Data_Get_Struct(self, xmlTextReader, reader); error_list = rb_funcall(self, rb_intern("errors"), 0); xmlSetStructuredErrorFunc((void *)error_list, Nokogiri_error_array_pusher); ret = xmlTextReaderRead(reader); xmlSetStructuredErrorFunc(NULL, NULL); if(ret == 1) return self; if(ret == 0) return Qnil; error = xmlGetLastError(); if(error) rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); else rb_raise(rb_eRuntimeError, "Error pulling: %d", ret); return Qnil; } /* * call-seq: * inner_xml * * Read the contents of the current node, including child nodes and markup. * Returns a utf-8 encoded string. */ static VALUE inner_xml(VALUE self) { xmlTextReaderPtr reader; xmlChar* value; VALUE str; Data_Get_Struct(self, xmlTextReader, reader); value = xmlTextReaderReadInnerXml(reader); str = Qnil; if(value) { str = NOKOGIRI_STR_NEW2((char*)value); xmlFree(value); } return str; } /* * call-seq: * outer_xml * * Read the current node and its contents, including child nodes and markup. * Returns a utf-8 encoded string. */ static VALUE outer_xml(VALUE self) { xmlTextReaderPtr reader; xmlChar *value; VALUE str = Qnil; Data_Get_Struct(self, xmlTextReader, reader); value = xmlTextReaderReadOuterXml(reader); if(value) { str = NOKOGIRI_STR_NEW2((char*)value); xmlFree(value); } return str; } /* * call-seq: * from_memory(string, url = nil, encoding = nil, options = 0) * * Create a new reader that parses +string+ */ static VALUE from_memory(int argc, VALUE *argv, VALUE klass) { VALUE rb_buffer, rb_url, encoding, rb_options; xmlTextReaderPtr reader; const char * c_url = NULL; const char * c_encoding = NULL; int c_options = 0; VALUE rb_reader, args[3]; rb_scan_args(argc, argv, "13", &rb_buffer, &rb_url, &encoding, &rb_options); if (!RTEST(rb_buffer)) rb_raise(rb_eArgError, "string cannot be nil"); if (RTEST(rb_url)) c_url = StringValuePtr(rb_url); if (RTEST(encoding)) c_encoding = StringValuePtr(encoding); if (RTEST(rb_options)) c_options = (int)NUM2INT(rb_options); reader = xmlReaderForMemory( StringValuePtr(rb_buffer), (int)RSTRING_LEN(rb_buffer), c_url, c_encoding, c_options ); if(reader == NULL) { xmlFreeTextReader(reader); rb_raise(rb_eRuntimeError, "couldn't create a parser"); } rb_reader = Data_Wrap_Struct(klass, NULL, dealloc, reader); args[0] = rb_buffer; args[1] = rb_url; args[2] = encoding; rb_obj_call_init(rb_reader, 3, args); return rb_reader; } /* * call-seq: * from_io(io, url = nil, encoding = nil, options = 0) * * Create a new reader that parses +io+ */ static VALUE from_io(int argc, VALUE *argv, VALUE klass) { VALUE rb_io, rb_url, encoding, rb_options; xmlTextReaderPtr reader; const char * c_url = NULL; const char * c_encoding = NULL; int c_options = 0; VALUE rb_reader, args[3]; rb_scan_args(argc, argv, "13", &rb_io, &rb_url, &encoding, &rb_options); if (!RTEST(rb_io)) rb_raise(rb_eArgError, "io cannot be nil"); if (RTEST(rb_url)) c_url = StringValuePtr(rb_url); if (RTEST(encoding)) c_encoding = StringValuePtr(encoding); if (RTEST(rb_options)) c_options = (int)NUM2INT(rb_options); reader = xmlReaderForIO( (xmlInputReadCallback)io_read_callback, (xmlInputCloseCallback)io_close_callback, (void *)rb_io, c_url, c_encoding, c_options ); if(reader == NULL) { xmlFreeTextReader(reader); rb_raise(rb_eRuntimeError, "couldn't create a parser"); } rb_reader = Data_Wrap_Struct(klass, NULL, dealloc, reader); args[0] = rb_io; args[1] = rb_url; args[2] = encoding; rb_obj_call_init(rb_reader, 3, args); return rb_reader; } /* * call-seq: * reader.empty_element? # => true or false * * Returns true if the current node is empty, otherwise false. */ static VALUE empty_element_p(VALUE self) { xmlTextReaderPtr reader; Data_Get_Struct(self, xmlTextReader, reader); if(xmlTextReaderIsEmptyElement(reader)) return Qtrue; return Qfalse; } VALUE cNokogiriXmlReader; void init_xml_reader() { VALUE module = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(module, "XML"); /* * The Reader parser allows you to effectively pull parse an XML document. * Once instantiated, call Nokogiri::XML::Reader#each to iterate over each * node. Note that you may only iterate over the document once! */ VALUE klass = rb_define_class_under(xml, "Reader", rb_cObject); cNokogiriXmlReader = klass; rb_define_singleton_method(klass, "from_memory", from_memory, -1); rb_define_singleton_method(klass, "from_io", from_io, -1); rb_define_method(klass, "read", read_more, 0); rb_define_method(klass, "inner_xml", inner_xml, 0); rb_define_method(klass, "outer_xml", outer_xml, 0); rb_define_method(klass, "state", state, 0); rb_define_method(klass, "node_type", node_type, 0); rb_define_method(klass, "name", name, 0); rb_define_method(klass, "local_name", local_name, 0); rb_define_method(klass, "namespace_uri", namespace_uri, 0); rb_define_method(klass, "prefix", prefix, 0); rb_define_method(klass, "value", value, 0); rb_define_method(klass, "lang", lang, 0); rb_define_method(klass, "xml_version", xml_version, 0); rb_define_method(klass, "depth", depth, 0); rb_define_method(klass, "attribute_count", attribute_count, 0); rb_define_method(klass, "attribute", reader_attribute, 1); rb_define_method(klass, "namespaces", namespaces, 0); rb_define_method(klass, "attribute_at", attribute_at, 1); rb_define_method(klass, "empty_element?", empty_element_p, 0); rb_define_method(klass, "attributes?", attributes_eh, 0); rb_define_method(klass, "value?", value_eh, 0); rb_define_method(klass, "default?", default_eh, 0); rb_define_method(klass, "base_uri", base_uri, 0); rb_define_private_method(klass, "attr_nodes", attribute_nodes, 0); } nokogiri-1.6.1/ext/nokogiri/xml_encoding_handler.c0000644000175000017500000000331012261213762021645 0ustar boutilboutil#include /* * call-seq: Nokogiri::EncodingHandler.[](name) * * Get the encoding handler for +name+ */ static VALUE get(VALUE klass, VALUE key) { xmlCharEncodingHandlerPtr handler; handler = xmlFindCharEncodingHandler(StringValuePtr(key)); if(handler) return Data_Wrap_Struct(klass, NULL, NULL, handler); return Qnil; } /* * call-seq: Nokogiri::EncodingHandler.delete(name) * * Delete the encoding alias named +name+ */ static VALUE delete(VALUE klass, VALUE name) { if(xmlDelEncodingAlias(StringValuePtr(name))) return Qnil; return Qtrue; } /* * call-seq: Nokogiri::EncodingHandler.alias(from, to) * * Alias encoding handler with name +from+ to name +to+ */ static VALUE alias(VALUE klass, VALUE from, VALUE to) { xmlAddEncodingAlias(StringValuePtr(from), StringValuePtr(to)); return to; } /* * call-seq: Nokogiri::EncodingHandler.clear_aliases! * * Remove all encoding aliases. */ static VALUE clear_aliases(VALUE klass) { xmlCleanupEncodingAliases(); return klass; } /* * call-seq: name * * Get the name of this EncodingHandler */ static VALUE name(VALUE self) { xmlCharEncodingHandlerPtr handler; Data_Get_Struct(self, xmlCharEncodingHandler, handler); return NOKOGIRI_STR_NEW2(handler->name); } void init_xml_encoding_handler() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE klass = rb_define_class_under(nokogiri, "EncodingHandler", rb_cObject); rb_define_singleton_method(klass, "[]", get, 1); rb_define_singleton_method(klass, "delete", delete, 1); rb_define_singleton_method(klass, "alias", alias, 2); rb_define_singleton_method(klass, "clear_aliases!", clear_aliases, 0); rb_define_method(klass, "name", name, 0); } nokogiri-1.6.1/ext/nokogiri/xml_cdata.h0000644000175000017500000000021512261213762017444 0ustar boutilboutil#ifndef NOKOGIRI_XML_CDATA #define NOKOGIRI_XML_CDATA #include void init_xml_cdata(); extern VALUE cNokogiriXmlCData; #endif nokogiri-1.6.1/ext/nokogiri/xml_encoding_handler.h0000644000175000017500000000021612261213762021654 0ustar boutilboutil#ifndef NOKOGIRI_XML_ENCODING_HANDLER #define NOKOGIRI_XML_ENCODING_HANDLER #include void init_xml_encoding_handler(); #endif nokogiri-1.6.1/ext/nokogiri/extconf.rb0000644000175000017500000001401012261213762017330 0ustar boutilboutilENV['RC_ARCHS'] = '' if RUBY_PLATFORM =~ /darwin/ # :stopdoc: require 'mkmf' RbConfig::MAKEFILE_CONFIG['CC'] = ENV['CC'] if ENV['CC'] ROOT = File.expand_path(File.join(File.dirname(__FILE__), '..', '..')) LIBDIR = RbConfig::CONFIG['libdir'] @libdir_basename = "lib" # shrug, ruby 2.0 won't work for me. INCLUDEDIR = RbConfig::CONFIG['includedir'] if defined?(RUBY_ENGINE) && RUBY_ENGINE == 'macruby' $LIBRUBYARG_STATIC.gsub!(/-static/, '') end $CFLAGS << " #{ENV["CFLAGS"]}" $LIBS << " #{ENV["LIBS"]}" windows_p = RbConfig::CONFIG['target_os'] == 'mingw32' || RbConfig::CONFIG['target_os'] =~ /mswin/ if windows_p $CFLAGS << " -DXP_WIN -DXP_WIN32 -DUSE_INCLUDED_VASPRINTF" elsif RbConfig::CONFIG['target_os'] =~ /solaris/ $CFLAGS << " -DUSE_INCLUDED_VASPRINTF" else $CFLAGS << " -g -DXP_UNIX" end if RbConfig::MAKEFILE_CONFIG['CC'] =~ /mingw/ $CFLAGS << " -DIN_LIBXML" $LIBS << " -lz" # TODO why is this necessary? end if RbConfig::MAKEFILE_CONFIG['CC'] =~ /gcc/ $CFLAGS << " -O3" unless $CFLAGS[/-O\d/] $CFLAGS << " -Wall -Wcast-qual -Wwrite-strings -Wconversion -Wmissing-noreturn -Winline" end if windows_p # I'm cross compiling! HEADER_DIRS = [INCLUDEDIR] LIB_DIRS = [LIBDIR] XML2_HEADER_DIRS = [File.join(INCLUDEDIR, "libxml2"), INCLUDEDIR] else if ENV['NOKOGIRI_USE_SYSTEM_LIBRARIES'] HEADER_DIRS = [ # First search /opt/local for macports '/opt/local/include', # Then search /usr/local for people that installed from source '/usr/local/include', # Check the ruby install locations INCLUDEDIR, # Finally fall back to /usr '/usr/include', '/usr/include/libxml2', ] LIB_DIRS = [ # First search /opt/local for macports '/opt/local/lib', # Then search /usr/local for people that installed from source '/usr/local/lib', # Check the ruby install locations LIBDIR, # Finally fall back to /usr '/usr/lib', ] XML2_HEADER_DIRS = [ '/opt/local/include/libxml2', '/usr/local/include/libxml2', File.join(INCLUDEDIR, "libxml2") ] + HEADER_DIRS # If the user has homebrew installed, use the libxml2 inside homebrew brew_prefix = `brew --prefix libxml2 2> /dev/null`.chomp unless brew_prefix.empty? LIB_DIRS.unshift File.join(brew_prefix, 'lib') XML2_HEADER_DIRS.unshift File.join(brew_prefix, 'include/libxml2') end else require 'mini_portile' require 'yaml' common_recipe = lambda do |recipe| recipe.target = File.join(ROOT, "ports") recipe.files = ["ftp://ftp.xmlsoft.org/libxml2/#{recipe.name}-#{recipe.version}.tar.gz"] checkpoint = "#{recipe.target}/#{recipe.name}-#{recipe.version}-#{recipe.host}.installed" unless File.exist?(checkpoint) recipe.cook FileUtils.touch checkpoint end recipe.activate end dependencies = YAML.load_file(File.join(ROOT, "dependencies.yml")) libxml2_recipe = MiniPortile.new("libxml2", dependencies["libxml2"]).tap do |recipe| recipe.configure_options = [ "--enable-shared", "--disable-static", "--without-python", "--without-readline", "--with-c14n", "--with-debug", "--with-threads" ] common_recipe.call recipe end libxslt_recipe = MiniPortile.new("libxslt", dependencies["libxslt"]).tap do |recipe| recipe.configure_options = [ "--enable-shared", "--disable-static", "--without-python", "--without-crypto", "--with-debug", "--with-libxml-prefix=#{libxml2_recipe.path}" ] common_recipe.call recipe end $LDFLAGS << " -Wl,-rpath,#{libxml2_recipe.path}/lib" $LDFLAGS << " -Wl,-rpath,#{libxslt_recipe.path}/lib" $CFLAGS << " -DNOKOGIRI_USE_PACKAGED_LIBRARIES -DNOKOGIRI_LIBXML2_PATH='\"#{libxml2_recipe.path}\"' -DNOKOGIRI_LIBXSLT_PATH='\"#{libxslt_recipe.path}\"'" HEADER_DIRS = [libxml2_recipe, libxslt_recipe].map { |f| File.join(f.path, "include") } LIB_DIRS = [libxml2_recipe, libxslt_recipe].map { |f| File.join(f.path, "lib") } XML2_HEADER_DIRS = HEADER_DIRS + [File.join(libxml2_recipe.path, "include", "libxml2")] end end dir_config('zlib', HEADER_DIRS, LIB_DIRS) dir_config('iconv', HEADER_DIRS, LIB_DIRS) dir_config('xml2', XML2_HEADER_DIRS, LIB_DIRS) dir_config('xslt', HEADER_DIRS, LIB_DIRS) def asplode(lib) abort "-----\n#{lib} is missing. please visit http://nokogiri.org/tutorials/installing_nokogiri.html for help with installing dependencies.\n-----" end pkg_config('libxslt') pkg_config('libxml-2.0') pkg_config('libiconv') def have_iconv? %w{ iconv_open libiconv_open }.any? do |method| have_func(method, 'iconv.h') or have_library('iconv', method, 'iconv.h') or find_library('iconv', method, 'iconv.h') end end asplode "libxml2" unless find_header('libxml/parser.h') asplode "libxslt" unless find_header('libxslt/xslt.h') asplode "libexslt" unless find_header('libexslt/exslt.h') asplode "libiconv" unless have_iconv? asplode "libxml2" unless find_library("xml2", 'xmlParseDoc') asplode "libxslt" unless find_library("xslt", 'xsltParseStylesheetDoc') asplode "libexslt" unless find_library("exslt", 'exsltFuncRegister') unless have_func('xmlHasFeature') abort "-----\nThe function 'xmlHasFeature' is missing from your installation of libxml2. Likely this means that your installed version of libxml2 is old enough that nokogiri will not work well. To get around this problem, please upgrade your installation of libxml2. Please visit http://nokogiri.org/tutorials/installing_nokogiri.html for more help!" end have_func 'xmlFirstElementChild' have_func('xmlRelaxNGSetParserStructuredErrors') have_func('xmlRelaxNGSetParserStructuredErrors') have_func('xmlRelaxNGSetValidStructuredErrors') have_func('xmlSchemaSetValidStructuredErrors') have_func('xmlSchemaSetParserStructuredErrors') if ENV['CPUPROFILE'] unless find_library('profiler', 'ProfilerEnable', *LIB_DIRS) abort "google performance tools are not installed" end end create_makefile('nokogiri/nokogiri') # :startdoc: nokogiri-1.6.1/ext/nokogiri/nokogiri.h0000644000175000017500000000741212261213762017337 0ustar boutilboutil#ifndef NOKOGIRI_NATIVE #define NOKOGIRI_NATIVE #include #include #include #include #ifdef USE_INCLUDED_VASPRINTF int vasprintf (char **strp, const char *fmt, va_list ap); #else #define _GNU_SOURCE # include #undef _GNU_SOURCE #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef HAVE_RUBY_ENCODING_H #include #else #include #endif #ifndef UNUSED # if defined(__GNUC__) # define MAYBE_UNUSED(name) name __attribute__((unused)) # define UNUSED(name) MAYBE_UNUSED(UNUSED_ ## name) # else # define MAYBE_UNUSED(name) name # define UNUSED(name) name # endif #endif #ifndef NORETURN # if defined(__GNUC__) # define NORETURN(name) __attribute__((noreturn)) name # else # define NORETURN(name) name # endif #endif #ifdef HAVE_RUBY_ENCODING_H #include #define NOKOGIRI_STR_NEW2(str) \ NOKOGIRI_STR_NEW(str, strlen((const char *)(str))) #define NOKOGIRI_STR_NEW(str, len) \ rb_external_str_new_with_enc((const char *)(str), (long)(len), rb_utf8_encoding()) #else #define NOKOGIRI_STR_NEW2(str) \ rb_str_new2((const char *)(str)) #define NOKOGIRI_STR_NEW(str, len) \ rb_str_new((const char *)(str), (long)(len)) #endif #define RBSTR_OR_QNIL(_str) \ (_str ? NOKOGIRI_STR_NEW2(_str) : Qnil) #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include extern VALUE mNokogiri ; extern VALUE mNokogiriXml ; extern VALUE mNokogiriXmlSax ; extern VALUE mNokogiriHtml ; extern VALUE mNokogiriHtmlSax ; extern VALUE mNokogiriXslt ; void nokogiri_root_node(xmlNodePtr); void nokogiri_root_nsdef(xmlNsPtr, xmlDocPtr); #ifdef DEBUG #define NOKOGIRI_DEBUG_START(p) if (getenv("NOKOGIRI_NO_FREE")) return ; if (getenv("NOKOGIRI_DEBUG")) fprintf(stderr,"nokogiri: %s:%d %p start\n", __FILE__, __LINE__, p); #define NOKOGIRI_DEBUG_END(p) if (getenv("NOKOGIRI_DEBUG")) fprintf(stderr,"nokogiri: %s:%d %p end\n", __FILE__, __LINE__, p); #else #define NOKOGIRI_DEBUG_START(p) #define NOKOGIRI_DEBUG_END(p) #endif #ifndef RSTRING_PTR #define RSTRING_PTR(s) (RSTRING(s)->ptr) #endif #ifndef RSTRING_LEN #define RSTRING_LEN(s) (RSTRING(s)->len) #endif #ifndef RARRAY_PTR #define RARRAY_PTR(a) RARRAY(a)->ptr #endif #ifndef RARRAY_LEN #define RARRAY_LEN(a) RARRAY(a)->len #endif #ifndef __builtin_expect # if defined(__GNUC__) # define __builtin_expect(expr, c) __builtin_expect((long)(expr), (long)(c)) # endif #endif #define XMLNS_PREFIX "xmlns" #define XMLNS_PREFIX_LEN 6 /* including either colon or \0 */ #define XMLNS_BUFFER_LEN 128 #endif nokogiri-1.6.1/ext/nokogiri/xml_node_set.c0000644000175000017500000003071312261213762020171 0ustar boutilboutil#include #include static ID decorate ; static int dealloc_namespace(xmlNsPtr ns) { if (ns->href) xmlFree((xmlChar *)ns->href); if (ns->prefix) xmlFree((xmlChar *)ns->prefix); xmlFree(ns); return ST_CONTINUE; } static void deallocate(nokogiriNodeSetTuple *tuple) { /* * xmlXPathFreeNodeSet() contains an implicit assumption that it is being * called before any of its pointed-to nodes have been free()d. this * assumption lies in the operation where it dereferences nodeTab pointers * while searching for namespace nodes to free. * * however, since Ruby's GC mechanism cannot guarantee the strict order in * which ruby objects will be GC'd, nodes may be garbage collected before a * nodeset containing pointers to those nodes. (this is true regardless of * how we declare dependencies between objects with rb_gc_mark().) * * as a result, xmlXPathFreeNodeSet() will perform unsafe memory operations, * and calling it would be evil. * * so here, we *manually* free the set of namespace nodes that was * constructed at initialization time (see Nokogiri_wrap_xml_node_set()), as * well as the NodeSet, without using the official xmlXPathFreeNodeSet(). * * there's probably a lesson in here somewhere about intermingling, within a * single array, structs with different memory-ownership semantics. or more * generally, a lesson about building an API in C/C++ that does not contain * assumptions about the strict order in which memory will be released. hey, * that sounds like a great idea for a blog post! get to it! * * "In Valgrind We Trust." seriously. */ xmlNodeSetPtr node_set; node_set = tuple->node_set; if (!node_set) return; NOKOGIRI_DEBUG_START(node_set) ; st_foreach(tuple->namespaces, dealloc_namespace, 0); if (node_set->nodeTab != NULL) xmlFree(node_set->nodeTab); xmlFree(node_set); st_free_table(tuple->namespaces); free(tuple); NOKOGIRI_DEBUG_END(node_set) ; } static VALUE allocate(VALUE klass) { return Nokogiri_wrap_xml_node_set(xmlXPathNodeSetCreate(NULL), Qnil); } /* * call-seq: * dup * * Duplicate this node set */ static VALUE duplicate(VALUE self) { nokogiriNodeSetTuple *tuple; xmlNodeSetPtr dupl; Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); dupl = xmlXPathNodeSetMerge(NULL, tuple->node_set); return Nokogiri_wrap_xml_node_set(dupl, rb_iv_get(self, "@document")); } /* * call-seq: * length * * Get the length of the node set */ static VALUE length(VALUE self) { nokogiriNodeSetTuple *tuple; Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); return tuple->node_set ? INT2NUM(tuple->node_set->nodeNr) : INT2NUM(0); } /* * call-seq: * push(node) * * Append +node+ to the NodeSet. */ static VALUE push(VALUE self, VALUE rb_node) { nokogiriNodeSetTuple *tuple; xmlNodePtr node; if(!(rb_obj_is_kind_of(rb_node, cNokogiriXmlNode) || rb_obj_is_kind_of(rb_node, cNokogiriXmlNamespace))) rb_raise(rb_eArgError, "node must be a Nokogiri::XML::Node or Nokogiri::XML::Namespace"); Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); Data_Get_Struct(rb_node, xmlNode, node); xmlXPathNodeSetAdd(tuple->node_set, node); return self; } /* * call-seq: * delete(node) * * Delete +node+ from the Nodeset, if it is a member. Returns the deleted node * if found, otherwise returns nil. */ static VALUE delete(VALUE self, VALUE rb_node) { nokogiriNodeSetTuple *tuple; xmlNodePtr node; xmlNodeSetPtr cur; int i; if (!(rb_obj_is_kind_of(rb_node, cNokogiriXmlNode) || rb_obj_is_kind_of(rb_node, cNokogiriXmlNamespace))) rb_raise(rb_eArgError, "node must be a Nokogiri::XML::Node or Nokogiri::XML::Namespace"); Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); Data_Get_Struct(rb_node, xmlNode, node); cur = tuple->node_set; if (xmlXPathNodeSetContains(cur, node)) { for (i = 0; i < cur->nodeNr; i++) if (cur->nodeTab[i] == node) break; cur->nodeNr--; for (;i < cur->nodeNr;i++) cur->nodeTab[i] = cur->nodeTab[i + 1]; cur->nodeTab[cur->nodeNr] = NULL; return rb_node; } return Qnil ; } /* * call-seq: * &(node_set) * * Set Intersection — Returns a new NodeSet containing nodes common to the two NodeSets. */ static VALUE intersection(VALUE self, VALUE rb_other) { nokogiriNodeSetTuple *tuple, *other; xmlNodeSetPtr intersection; if(!rb_obj_is_kind_of(rb_other, cNokogiriXmlNodeSet)) rb_raise(rb_eArgError, "node_set must be a Nokogiri::XML::NodeSet"); Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); Data_Get_Struct(rb_other, nokogiriNodeSetTuple, other); intersection = xmlXPathIntersection(tuple->node_set, other->node_set); return Nokogiri_wrap_xml_node_set(intersection, rb_iv_get(self, "@document")); } /* * call-seq: * include?(node) * * Returns true if any member of node set equals +node+. */ static VALUE include_eh(VALUE self, VALUE rb_node) { nokogiriNodeSetTuple *tuple; xmlNodePtr node; if(!(rb_obj_is_kind_of(rb_node, cNokogiriXmlNode) || rb_obj_is_kind_of(rb_node, cNokogiriXmlNamespace))) rb_raise(rb_eArgError, "node must be a Nokogiri::XML::Node or Nokogiri::XML::Namespace"); Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); Data_Get_Struct(rb_node, xmlNode, node); return (xmlXPathNodeSetContains(tuple->node_set, node) ? Qtrue : Qfalse); } /* * call-seq: * |(node_set) * * Returns a new set built by merging the set and the elements of the given * set. */ static VALUE set_union(VALUE self, VALUE rb_other) { nokogiriNodeSetTuple *tuple, *other; xmlNodeSetPtr new; if(!rb_obj_is_kind_of(rb_other, cNokogiriXmlNodeSet)) rb_raise(rb_eArgError, "node_set must be a Nokogiri::XML::NodeSet"); Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); Data_Get_Struct(rb_other, nokogiriNodeSetTuple, other); new = xmlXPathNodeSetMerge(NULL, tuple->node_set); new = xmlXPathNodeSetMerge(new, other->node_set); return Nokogiri_wrap_xml_node_set(new, rb_iv_get(self, "@document")); } /* * call-seq: * -(node_set) * * Difference - returns a new NodeSet that is a copy of this NodeSet, removing * each item that also appears in +node_set+ */ static VALUE minus(VALUE self, VALUE rb_other) { nokogiriNodeSetTuple *tuple, *other; xmlNodeSetPtr new; int j ; if(!rb_obj_is_kind_of(rb_other, cNokogiriXmlNodeSet)) rb_raise(rb_eArgError, "node_set must be a Nokogiri::XML::NodeSet"); Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); Data_Get_Struct(rb_other, nokogiriNodeSetTuple, other); new = xmlXPathNodeSetMerge(NULL, tuple->node_set); for (j = 0 ; j < other->node_set->nodeNr ; ++j) { xmlXPathNodeSetDel(new, other->node_set->nodeTab[j]); } return Nokogiri_wrap_xml_node_set(new, rb_iv_get(self, "@document")); } static VALUE index_at(VALUE self, long offset) { xmlNodeSetPtr node_set; nokogiriNodeSetTuple *tuple; Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); node_set = tuple->node_set; if (offset >= node_set->nodeNr || abs((int)offset) > node_set->nodeNr) return Qnil; if (offset < 0) offset += node_set->nodeNr; if (XML_NAMESPACE_DECL == node_set->nodeTab[offset]->type) return Nokogiri_wrap_xml_namespace2(rb_iv_get(self, "@document"), (xmlNsPtr)(node_set->nodeTab[offset])); return Nokogiri_wrap_xml_node(Qnil, node_set->nodeTab[offset]); } static VALUE subseq(VALUE self, long beg, long len) { long j; nokogiriNodeSetTuple *tuple; xmlNodeSetPtr node_set; xmlNodeSetPtr new_set ; Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); node_set = tuple->node_set; if (beg > node_set->nodeNr) return Qnil ; if (beg < 0 || len < 0) return Qnil ; if ((beg + len) > node_set->nodeNr) { len = node_set->nodeNr - beg ; } new_set = xmlXPathNodeSetCreate(NULL); for (j = beg ; j < beg+len ; ++j) { xmlXPathNodeSetAddUnique(new_set, node_set->nodeTab[j]); } return Nokogiri_wrap_xml_node_set(new_set, rb_iv_get(self, "@document")); } /* * call-seq: * [index] -> Node or nil * [start, length] -> NodeSet or nil * [range] -> NodeSet or nil * slice(index) -> Node or nil * slice(start, length) -> NodeSet or nil * slice(range) -> NodeSet or nil * * Element reference - returns the node at +index+, or returns a NodeSet * containing nodes starting at +start+ and continuing for +length+ elements, or * returns a NodeSet containing nodes specified by +range+. Negative +indices+ * count backward from the end of the +node_set+ (-1 is the last node). Returns * nil if the +index+ (or +start+) are out of range. */ static VALUE slice(int argc, VALUE *argv, VALUE self) { VALUE arg ; long beg, len ; xmlNodeSetPtr node_set; nokogiriNodeSetTuple *tuple; Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); node_set = tuple->node_set; if (argc == 2) { beg = NUM2LONG(argv[0]); len = NUM2LONG(argv[1]); if (beg < 0) { beg += node_set->nodeNr ; } return subseq(self, beg, len); } if (argc != 1) { rb_scan_args(argc, argv, "11", NULL, NULL); } arg = argv[0]; if (FIXNUM_P(arg)) { return index_at(self, FIX2LONG(arg)); } /* if arg is Range */ switch (rb_range_beg_len(arg, &beg, &len, (long)node_set->nodeNr, 0)) { case Qfalse: break; case Qnil: return Qnil; default: return subseq(self, beg, len); } return index_at(self, NUM2LONG(arg)); } /* * call-seq: * to_a * * Return this list as an Array */ static VALUE to_array(VALUE self, VALUE rb_node) { xmlNodeSetPtr set; VALUE *elts; VALUE list; int i; nokogiriNodeSetTuple *tuple; Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); set = tuple->node_set; elts = calloc((size_t)set->nodeNr, sizeof(VALUE *)); for(i = 0; i < set->nodeNr; i++) { if (XML_NAMESPACE_DECL == set->nodeTab[i]->type) elts[i] = Nokogiri_wrap_xml_namespace2(rb_iv_get(self, "@document"), (xmlNsPtr)(set->nodeTab[i])); else elts[i] = Nokogiri_wrap_xml_node(Qnil, set->nodeTab[i]); } list = rb_ary_new4((long)set->nodeNr, elts); /*free(elts); */ return list; } /* * call-seq: * unlink * * Unlink this NodeSet and all Node objects it contains from their current context. */ static VALUE unlink_nodeset(VALUE self) { xmlNodeSetPtr node_set; int j, nodeNr ; nokogiriNodeSetTuple *tuple; Data_Get_Struct(self, nokogiriNodeSetTuple, tuple); node_set = tuple->node_set; nodeNr = node_set->nodeNr ; for (j = 0 ; j < nodeNr ; j++) { if (XML_NAMESPACE_DECL != node_set->nodeTab[j]->type) { VALUE node ; xmlNodePtr node_ptr; node = Nokogiri_wrap_xml_node(Qnil, node_set->nodeTab[j]); rb_funcall(node, rb_intern("unlink"), 0); /* modifies the C struct out from under the object */ Data_Get_Struct(node, xmlNode, node_ptr); node_set->nodeTab[j] = node_ptr ; } } return self ; } VALUE Nokogiri_wrap_xml_node_set(xmlNodeSetPtr node_set, VALUE document) { VALUE new_set ; int i; xmlNodePtr cur; xmlNsPtr ns; nokogiriNodeSetTuple *tuple; new_set = Data_Make_Struct(cNokogiriXmlNodeSet, nokogiriNodeSetTuple, 0, deallocate, tuple); tuple->node_set = node_set; tuple->namespaces = st_init_numtable(); if (!NIL_P(document)) { rb_iv_set(new_set, "@document", document); rb_funcall(document, decorate, 1, new_set); } if (node_set && node_set->nodeTab) { for (i = 0; i < node_set->nodeNr; i++) { cur = node_set->nodeTab[i]; if (cur && cur->type == XML_NAMESPACE_DECL) { ns = (xmlNsPtr)cur; if (ns->next && ns->next->type != XML_NAMESPACE_DECL) st_insert(tuple->namespaces, (st_data_t)cur, (st_data_t)0); } } } return new_set ; } VALUE cNokogiriXmlNodeSet ; void init_xml_node_set(void) { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE klass = rb_define_class_under(xml, "NodeSet", rb_cObject); cNokogiriXmlNodeSet = klass; rb_define_alloc_func(klass, allocate); rb_define_method(klass, "length", length, 0); rb_define_method(klass, "[]", slice, -1); rb_define_method(klass, "slice", slice, -1); rb_define_method(klass, "push", push, 1); rb_define_method(klass, "|", set_union, 1); rb_define_method(klass, "-", minus, 1); rb_define_method(klass, "unlink", unlink_nodeset, 0); rb_define_method(klass, "to_a", to_array, 0); rb_define_method(klass, "dup", duplicate, 0); rb_define_method(klass, "delete", delete, 1); rb_define_method(klass, "&", intersection, 1); rb_define_method(klass, "include?", include_eh, 1); decorate = rb_intern("decorate"); } nokogiri-1.6.1/ext/nokogiri/xml_sax_push_parser.h0000644000175000017500000000026412261213762021602 0ustar boutilboutil#ifndef NOKOGIRI_XML_SAX_PUSH_PARSER #define NOKOGIRI_XML_SAX_PUSH_PARSER #include void init_xml_sax_push_parser(); extern VALUE cNokogiriXmlSaxPushParser ; #endif nokogiri-1.6.1/ext/nokogiri/html_entity_lookup.h0000644000175000017500000000021012261213762021434 0ustar boutilboutil#ifndef NOKOGIRI_HTML_ENTITY_LOOKUP #define NOKOGIRI_HTML_ENTITY_LOOKUP #include void init_html_entity_lookup(); #endif nokogiri-1.6.1/ext/nokogiri/xml_node.c0000644000175000017500000011205112261213762017312 0ustar boutilboutil#include static ID decorate, decorate_bang; #ifdef DEBUG static void debug_node_dealloc(xmlNodePtr x) { NOKOGIRI_DEBUG_START(x) NOKOGIRI_DEBUG_END(x) } #else # define debug_node_dealloc 0 #endif static void mark(xmlNodePtr node) { rb_gc_mark(DOC_RUBY_OBJECT(node->doc)); } /* :nodoc: */ typedef xmlNodePtr (*pivot_reparentee_func)(xmlNodePtr, xmlNodePtr); /* :nodoc: */ static void relink_namespace(xmlNodePtr reparented) { xmlChar *name, *prefix; xmlNodePtr child; xmlNsPtr ns; if (reparented->type != XML_ATTRIBUTE_NODE && reparented->type != XML_ELEMENT_NODE) return; if (reparented->ns == NULL || reparented->ns->prefix == NULL) { name = xmlSplitQName2(reparented->name, &prefix); if(reparented->type == XML_ATTRIBUTE_NODE) { if (prefix == NULL || strcmp((char*)prefix, XMLNS_PREFIX) == 0) return; } ns = xmlSearchNs(reparented->doc, reparented, prefix); if (ns == NULL && reparented->parent) { ns = xmlSearchNs(reparented->doc, reparented->parent, prefix); } if (ns != NULL) { xmlNodeSetName(reparented, name); xmlSetNs(reparented, ns); } } /* Avoid segv when relinking against unlinked nodes. */ if (reparented->type != XML_ELEMENT_NODE || !reparented->parent) return; /* Make sure that our reparented node has the correct namespaces */ if(!reparented->ns && reparented->doc != (xmlDocPtr)reparented->parent) xmlSetNs(reparented, reparented->parent->ns); /* Search our parents for an existing definition */ if(reparented->nsDef) { xmlNsPtr curr = reparented->nsDef; xmlNsPtr prev = NULL; while(curr) { xmlNsPtr ns = xmlSearchNsByHref( reparented->doc, reparented->parent, curr->href ); /* If we find the namespace is already declared, remove it from this * definition list. */ if(ns && ns != curr) { if (prev) { prev->next = curr->next; } else { reparented->nsDef = curr->next; } nokogiri_root_nsdef(curr, reparented->doc); } else { prev = curr; } curr = curr->next; } } /* Only walk all children if there actually is a namespace we need to */ /* reparent. */ if(NULL == reparented->ns) return; /* When a node gets reparented, walk it's children to make sure that */ /* their namespaces are reparented as well. */ child = reparented->children; while(NULL != child) { relink_namespace(child); child = child->next; } if (reparented->type == XML_ELEMENT_NODE) { child = (xmlNodePtr)((xmlElementPtr)reparented)->attributes; while(NULL != child) { relink_namespace(child); child = child->next; } } } /* :nodoc: */ static xmlNodePtr xmlReplaceNodeWrapper(xmlNodePtr pivot, xmlNodePtr new_node) { xmlNodePtr retval ; retval = xmlReplaceNode(pivot, new_node) ; if (retval == pivot) { retval = new_node ; /* return semantics for reparent_node_with */ } /* work around libxml2 issue: https://bugzilla.gnome.org/show_bug.cgi?id=615612 */ if (retval && retval->type == XML_TEXT_NODE) { if (retval->prev && retval->prev->type == XML_TEXT_NODE) { retval = xmlTextMerge(retval->prev, retval); } if (retval->next && retval->next->type == XML_TEXT_NODE) { retval = xmlTextMerge(retval, retval->next); } } return retval ; } /* :nodoc: */ static VALUE reparent_node_with(VALUE pivot_obj, VALUE reparentee_obj, pivot_reparentee_func prf) { VALUE reparented_obj ; xmlNodePtr reparentee, pivot, reparented, next_text, new_next_text ; if(!rb_obj_is_kind_of(reparentee_obj, cNokogiriXmlNode)) rb_raise(rb_eArgError, "node must be a Nokogiri::XML::Node"); if(rb_obj_is_kind_of(reparentee_obj, cNokogiriXmlDocument)) rb_raise(rb_eArgError, "node must be a Nokogiri::XML::Node"); Data_Get_Struct(reparentee_obj, xmlNode, reparentee); Data_Get_Struct(pivot_obj, xmlNode, pivot); if(XML_DOCUMENT_NODE == reparentee->type || XML_HTML_DOCUMENT_NODE == reparentee->type) rb_raise(rb_eArgError, "cannot reparent a document node"); xmlUnlinkNode(reparentee); if (reparentee->doc != pivot->doc || reparentee->type == XML_TEXT_NODE) { /* * if the reparentee is a text node, there's a very good chance it will be * merged with an adjacent text node after being reparented, and in that case * libxml will free the underlying C struct. * * since we clearly have a ruby object which references the underlying * memory, we can't let the C struct get freed. let's pickle the original * reparentee by rooting it; and then we'll reparent a duplicate of the * node that we don't care about preserving. * * alternatively, if the reparentee is from a different document than the * pivot node, libxml2 is going to get confused about which document's * "dictionary" the node's strings belong to (this is an otherwise * uninteresting libxml2 implementation detail). as a result, we cannot * reparent the actual reparentee, so we reparent a duplicate. */ nokogiri_root_node(reparentee); if (!(reparentee = xmlDocCopyNode(reparentee, pivot->doc, 1))) { rb_raise(rb_eRuntimeError, "Could not reparent node (xmlDocCopyNode)"); } } if (prf != xmlAddPrevSibling && prf != xmlAddNextSibling && reparentee->type == XML_TEXT_NODE && pivot->next && pivot->next->type == XML_TEXT_NODE) { /* * libxml merges text nodes in a right-to-left fashion, meaning that if * there are two text nodes who would be adjacent, the right (or following, * or next) node will be merged into the left (or preceding, or previous) * node. * * and by "merged" I mean the string contents will be concatenated onto the * left node's contents, and then the node will be freed. * * which means that if we have a ruby object wrapped around the right node, * its memory would be freed out from under it. * * so, we detect this edge case and unlink-and-root the text node before it gets * merged. then we dup the node and insert that duplicate back into the * document where the real node was. * * yes, this is totally lame. */ next_text = pivot->next ; new_next_text = xmlDocCopyNode(next_text, pivot->doc, 1) ; xmlUnlinkNode(next_text); nokogiri_root_node(next_text); xmlAddNextSibling(pivot, new_next_text); } if(!(reparented = (*prf)(pivot, reparentee))) { rb_raise(rb_eRuntimeError, "Could not reparent node"); } /* * make sure the ruby object is pointed at the just-reparented node, which * might be a duplicate (see above) or might be the result of merging * adjacent text nodes. */ DATA_PTR(reparentee_obj) = reparented ; relink_namespace(reparented); reparented_obj = Nokogiri_wrap_xml_node(Qnil, reparented); rb_funcall(reparented_obj, decorate_bang, 0); return reparented_obj ; } /* * call-seq: * document * * Get the document for this Node */ static VALUE document(VALUE self) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); return DOC_RUBY_OBJECT(node->doc); } /* * call-seq: * pointer_id * * Get the internal pointer number */ static VALUE pointer_id(VALUE self) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); return INT2NUM((long)(node)); } /* * call-seq: * encode_special_chars(string) * * Encode any special characters in +string+ */ static VALUE encode_special_chars(VALUE self, VALUE string) { xmlNodePtr node; xmlChar *encoded; VALUE encoded_str; Data_Get_Struct(self, xmlNode, node); encoded = xmlEncodeSpecialChars( node->doc, (const xmlChar *)StringValuePtr(string) ); encoded_str = NOKOGIRI_STR_NEW2(encoded); xmlFree(encoded); return encoded_str; } /* * call-seq: * create_internal_subset(name, external_id, system_id) * * Create the internal subset of a document. * * doc.create_internal_subset("chapter", "-//OASIS//DTD DocBook XML//EN", "chapter.dtd") * # => * * doc.create_internal_subset("chapter", nil, "chapter.dtd") * # => */ static VALUE create_internal_subset(VALUE self, VALUE name, VALUE external_id, VALUE system_id) { xmlNodePtr node; xmlDocPtr doc; xmlDtdPtr dtd; Data_Get_Struct(self, xmlNode, node); doc = node->doc; if(xmlGetIntSubset(doc)) rb_raise(rb_eRuntimeError, "Document already has an internal subset"); dtd = xmlCreateIntSubset( doc, NIL_P(name) ? NULL : (const xmlChar *)StringValuePtr(name), NIL_P(external_id) ? NULL : (const xmlChar *)StringValuePtr(external_id), NIL_P(system_id) ? NULL : (const xmlChar *)StringValuePtr(system_id) ); if(!dtd) return Qnil; return Nokogiri_wrap_xml_node(Qnil, (xmlNodePtr)dtd); } /* * call-seq: * create_external_subset(name, external_id, system_id) * * Create an external subset */ static VALUE create_external_subset(VALUE self, VALUE name, VALUE external_id, VALUE system_id) { xmlNodePtr node; xmlDocPtr doc; xmlDtdPtr dtd; Data_Get_Struct(self, xmlNode, node); doc = node->doc; if(doc->extSubset) rb_raise(rb_eRuntimeError, "Document already has an external subset"); dtd = xmlNewDtd( doc, NIL_P(name) ? NULL : (const xmlChar *)StringValuePtr(name), NIL_P(external_id) ? NULL : (const xmlChar *)StringValuePtr(external_id), NIL_P(system_id) ? NULL : (const xmlChar *)StringValuePtr(system_id) ); if(!dtd) return Qnil; return Nokogiri_wrap_xml_node(Qnil, (xmlNodePtr)dtd); } /* * call-seq: * external_subset * * Get the external subset */ static VALUE external_subset(VALUE self) { xmlNodePtr node; xmlDocPtr doc; xmlDtdPtr dtd; Data_Get_Struct(self, xmlNode, node); if(!node->doc) return Qnil; doc = node->doc; dtd = doc->extSubset; if(!dtd) return Qnil; return Nokogiri_wrap_xml_node(Qnil, (xmlNodePtr)dtd); } /* * call-seq: * internal_subset * * Get the internal subset */ static VALUE internal_subset(VALUE self) { xmlNodePtr node; xmlDocPtr doc; xmlDtdPtr dtd; Data_Get_Struct(self, xmlNode, node); if(!node->doc) return Qnil; doc = node->doc; dtd = xmlGetIntSubset(doc); if(!dtd) return Qnil; return Nokogiri_wrap_xml_node(Qnil, (xmlNodePtr)dtd); } /* * call-seq: * dup * * Copy this node. An optional depth may be passed in, but it defaults * to a deep copy. 0 is a shallow copy, 1 is a deep copy. */ static VALUE duplicate_node(int argc, VALUE *argv, VALUE self) { VALUE level; xmlNodePtr node, dup; if(rb_scan_args(argc, argv, "01", &level) == 0) level = INT2NUM((long)1); Data_Get_Struct(self, xmlNode, node); dup = xmlDocCopyNode(node, node->doc, (int)NUM2INT(level)); if(dup == NULL) return Qnil; nokogiri_root_node(dup); return Nokogiri_wrap_xml_node(rb_obj_class(self), dup); } /* * call-seq: * unlink * * Unlink this node from its current context. */ static VALUE unlink_node(VALUE self) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); xmlUnlinkNode(node); nokogiri_root_node(node); return self; } /* * call-seq: * blank? * * Is this node blank? */ static VALUE blank_eh(VALUE self) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); return (1 == xmlIsBlankNode(node)) ? Qtrue : Qfalse ; } /* * call-seq: * next_sibling * * Returns the next sibling node */ static VALUE next_sibling(VALUE self) { xmlNodePtr node, sibling; Data_Get_Struct(self, xmlNode, node); sibling = node->next; if(!sibling) return Qnil; return Nokogiri_wrap_xml_node(Qnil, sibling) ; } /* * call-seq: * previous_sibling * * Returns the previous sibling node */ static VALUE previous_sibling(VALUE self) { xmlNodePtr node, sibling; Data_Get_Struct(self, xmlNode, node); sibling = node->prev; if(!sibling) return Qnil; return Nokogiri_wrap_xml_node(Qnil, sibling); } /* * call-seq: * next_element * * Returns the next Nokogiri::XML::Element type sibling node. */ static VALUE next_element(VALUE self) { xmlNodePtr node, sibling; Data_Get_Struct(self, xmlNode, node); sibling = xmlNextElementSibling(node); if(!sibling) return Qnil; return Nokogiri_wrap_xml_node(Qnil, sibling); } /* * call-seq: * previous_element * * Returns the previous Nokogiri::XML::Element type sibling node. */ static VALUE previous_element(VALUE self) { xmlNodePtr node, sibling; Data_Get_Struct(self, xmlNode, node); /* * note that we don't use xmlPreviousElementSibling here because it's buggy pre-2.7.7. */ sibling = node->prev; if(!sibling) return Qnil; while(sibling && sibling->type != XML_ELEMENT_NODE) sibling = sibling->prev; return sibling ? Nokogiri_wrap_xml_node(Qnil, sibling) : Qnil ; } /* :nodoc: */ static VALUE replace(VALUE self, VALUE new_node) { VALUE reparent = reparent_node_with(self, new_node, xmlReplaceNodeWrapper); xmlNodePtr pivot; Data_Get_Struct(self, xmlNode, pivot); nokogiri_root_node(pivot); return reparent; } /* * call-seq: * children * * Get the list of children for this node as a NodeSet */ static VALUE children(VALUE self) { xmlNodePtr node; xmlNodePtr child; xmlNodeSetPtr set; VALUE document; VALUE node_set; Data_Get_Struct(self, xmlNode, node); child = node->children; set = xmlXPathNodeSetCreate(child); document = DOC_RUBY_OBJECT(node->doc); if(!child) return Nokogiri_wrap_xml_node_set(set, document); child = child->next; while(NULL != child) { xmlXPathNodeSetAddUnique(set, child); child = child->next; } node_set = Nokogiri_wrap_xml_node_set(set, document); return node_set; } /* * call-seq: * element_children * * Get the list of children for this node as a NodeSet. All nodes will be * element nodes. * * Example: * * @doc.root.element_children.all? { |x| x.element? } # => true */ static VALUE element_children(VALUE self) { xmlNodePtr node; xmlNodePtr child; xmlNodeSetPtr set; VALUE document; VALUE node_set; Data_Get_Struct(self, xmlNode, node); child = xmlFirstElementChild(node); set = xmlXPathNodeSetCreate(child); document = DOC_RUBY_OBJECT(node->doc); if(!child) return Nokogiri_wrap_xml_node_set(set, document); child = xmlNextElementSibling(child); while(NULL != child) { xmlXPathNodeSetAddUnique(set, child); child = xmlNextElementSibling(child); } node_set = Nokogiri_wrap_xml_node_set(set, document); return node_set; } /* * call-seq: * child * * Returns the child node */ static VALUE child(VALUE self) { xmlNodePtr node, child; Data_Get_Struct(self, xmlNode, node); child = node->children; if(!child) return Qnil; return Nokogiri_wrap_xml_node(Qnil, child); } /* * call-seq: * first_element_child * * Returns the first child node of this node that is an element. * * Example: * * @doc.root.first_element_child.element? # => true */ static VALUE first_element_child(VALUE self) { xmlNodePtr node, child; Data_Get_Struct(self, xmlNode, node); child = xmlFirstElementChild(node); if(!child) return Qnil; return Nokogiri_wrap_xml_node(Qnil, child); } /* * call-seq: * last_element_child * * Returns the last child node of this node that is an element. * * Example: * * @doc.root.last_element_child.element? # => true */ static VALUE last_element_child(VALUE self) { xmlNodePtr node, child; Data_Get_Struct(self, xmlNode, node); child = xmlLastElementChild(node); if(!child) return Qnil; return Nokogiri_wrap_xml_node(Qnil, child); } /* * call-seq: * key?(attribute) * * Returns true if +attribute+ is set */ static VALUE key_eh(VALUE self, VALUE attribute) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); if(xmlHasProp(node, (xmlChar *)StringValuePtr(attribute))) return Qtrue; return Qfalse; } /* * call-seq: * namespaced_key?(attribute, namespace) * * Returns true if +attribute+ is set with +namespace+ */ static VALUE namespaced_key_eh(VALUE self, VALUE attribute, VALUE namespace) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); if(xmlHasNsProp(node, (xmlChar *)StringValuePtr(attribute), NIL_P(namespace) ? NULL : (xmlChar *)StringValuePtr(namespace))) return Qtrue; return Qfalse; } /* * call-seq: * []=(property, value) * * Set the +property+ to +value+ */ static VALUE set(VALUE self, VALUE property, VALUE value) { xmlNodePtr node, cur; xmlAttrPtr prop; Data_Get_Struct(self, xmlNode, node); /* If a matching attribute node already exists, then xmlSetProp will destroy * the existing node's children. However, if Nokogiri has a node object * pointing to one of those children, we are left with a broken reference. * * We can avoid this by unlinking these nodes first. */ if (node->type != XML_ELEMENT_NODE) return(Qnil); prop = xmlHasProp(node, (xmlChar *)StringValuePtr(property)); if (prop && prop->children) { for (cur = prop->children; cur; cur = cur->next) { if (cur->_private) { nokogiri_root_node(cur); xmlUnlinkNode(cur); } } } xmlSetProp(node, (xmlChar *)StringValuePtr(property), (xmlChar *)StringValuePtr(value)); return value; } /* * call-seq: * get(attribute) * * Get the value for +attribute+ */ static VALUE get(VALUE self, VALUE rattribute) { xmlNodePtr node; xmlChar* value = 0; VALUE rvalue ; char* attribute = 0; char *colon = 0, *attr_name = 0, *prefix = 0; xmlNsPtr ns; if (NIL_P(rattribute)) return Qnil; Data_Get_Struct(self, xmlNode, node); attribute = strdup(StringValuePtr(rattribute)); colon = strchr(attribute, ':'); if (colon) { (*colon) = 0 ; /* create two null-terminated strings of the prefix and attribute name */ prefix = attribute ; attr_name = colon + 1 ; ns = xmlSearchNs(node->doc, node, (const xmlChar *)(prefix)); if (ns) { value = xmlGetNsProp(node, (xmlChar*)(attr_name), ns->href); } else { value = xmlGetProp(node, (xmlChar*)StringValuePtr(rattribute)); } } else { value = xmlGetNoNsProp(node, (xmlChar*)attribute); } free(attribute); if (!value) return Qnil; rvalue = NOKOGIRI_STR_NEW2(value); xmlFree(value); return rvalue ; } /* * call-seq: * set_namespace(namespace) * * Set the namespace to +namespace+ */ static VALUE set_namespace(VALUE self, VALUE namespace) { xmlNodePtr node; xmlNsPtr ns = NULL; Data_Get_Struct(self, xmlNode, node); if(!NIL_P(namespace)) Data_Get_Struct(namespace, xmlNs, ns); xmlSetNs(node, ns); return self; } /* * call-seq: * attribute(name) * * Get the attribute node with +name+ */ static VALUE attr(VALUE self, VALUE name) { xmlNodePtr node; xmlAttrPtr prop; Data_Get_Struct(self, xmlNode, node); prop = xmlHasProp(node, (xmlChar *)StringValuePtr(name)); if(! prop) return Qnil; return Nokogiri_wrap_xml_node(Qnil, (xmlNodePtr)prop); } /* * call-seq: * attribute_with_ns(name, namespace) * * Get the attribute node with +name+ and +namespace+ */ static VALUE attribute_with_ns(VALUE self, VALUE name, VALUE namespace) { xmlNodePtr node; xmlAttrPtr prop; Data_Get_Struct(self, xmlNode, node); prop = xmlHasNsProp(node, (xmlChar *)StringValuePtr(name), NIL_P(namespace) ? NULL : (xmlChar *)StringValuePtr(namespace)); if(! prop) return Qnil; return Nokogiri_wrap_xml_node(Qnil, (xmlNodePtr)prop); } /* * call-seq: * attribute_nodes() * * returns a list containing the Node attributes. */ static VALUE attribute_nodes(VALUE self) { /* this code in the mode of xmlHasProp() */ xmlNodePtr node; VALUE attr; Data_Get_Struct(self, xmlNode, node); attr = rb_ary_new(); Nokogiri_xml_node_properties(node, attr); return attr ; } /* * call-seq: * namespace() * * returns the default namespace set on this node (as with an "xmlns=" * attribute), as a Namespace object. */ static VALUE namespace(VALUE self) { xmlNodePtr node ; Data_Get_Struct(self, xmlNode, node); if (node->ns) return Nokogiri_wrap_xml_namespace(node->doc, node->ns); return Qnil ; } /* * call-seq: * namespace_definitions() * * returns namespaces defined on self element directly, as an array of Namespace objects. Includes both a default namespace (as in"xmlns="), and prefixed namespaces (as in "xmlns:prefix="). */ static VALUE namespace_definitions(VALUE self) { /* this code in the mode of xmlHasProp() */ xmlNodePtr node ; VALUE list; xmlNsPtr ns; Data_Get_Struct(self, xmlNode, node); list = rb_ary_new(); ns = node->nsDef; if(!ns) return list; while(NULL != ns) { rb_ary_push(list, Nokogiri_wrap_xml_namespace(node->doc, ns)); ns = ns->next; } return list; } /* * call-seq: * namespace_scopes() * * returns namespaces in scope for self -- those defined on self element * directly or any ancestor node -- as an array of Namespace objects. Default * namespaces ("xmlns=" style) for self are included in this array; Default * namespaces for ancestors, however, are not. See also #namespaces */ static VALUE namespace_scopes(VALUE self) { xmlNodePtr node ; VALUE list; xmlNsPtr *ns_list; int j; Data_Get_Struct(self, xmlNode, node); list = rb_ary_new(); ns_list = xmlGetNsList(node->doc, node); if(!ns_list) return list; for (j = 0 ; ns_list[j] != NULL ; ++j) { rb_ary_push(list, Nokogiri_wrap_xml_namespace(node->doc, ns_list[j])); } xmlFree(ns_list); return list; } /* * call-seq: * node_type * * Get the type for this Node */ static VALUE node_type(VALUE self) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); return INT2NUM((long)node->type); } /* * call-seq: * content= * * Set the content for this Node */ static VALUE native_content(VALUE self, VALUE content) { xmlNodePtr node, child, next ; Data_Get_Struct(self, xmlNode, node); child = node->children; while (NULL != child) { next = child->next ; xmlUnlinkNode(child) ; nokogiri_root_node(child); child = next ; } xmlNodeSetContent(node, (xmlChar *)StringValuePtr(content)); return content; } /* * call-seq: * content * * Returns the content for this Node */ static VALUE get_content(VALUE self) { xmlNodePtr node; xmlChar * content; Data_Get_Struct(self, xmlNode, node); content = xmlNodeGetContent(node); if(content) { VALUE rval = NOKOGIRI_STR_NEW2(content); xmlFree(content); return rval; } return Qnil; } /* :nodoc: */ static VALUE add_child(VALUE self, VALUE new_child) { return reparent_node_with(self, new_child, xmlAddChild); } /* * call-seq: * parent * * Get the parent Node for this Node */ static VALUE get_parent(VALUE self) { xmlNodePtr node, parent; Data_Get_Struct(self, xmlNode, node); parent = node->parent; if(!parent) return Qnil; return Nokogiri_wrap_xml_node(Qnil, parent) ; } /* * call-seq: * name=(new_name) * * Set the name for this Node */ static VALUE set_name(VALUE self, VALUE new_name) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); xmlNodeSetName(node, (xmlChar*)StringValuePtr(new_name)); return new_name; } /* * call-seq: * name * * Returns the name for this Node */ static VALUE get_name(VALUE self) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); if(node->name) return NOKOGIRI_STR_NEW2(node->name); return Qnil; } /* * call-seq: * path * * Returns the path associated with this Node */ static VALUE path(VALUE self) { xmlNodePtr node; xmlChar *path ; VALUE rval; Data_Get_Struct(self, xmlNode, node); path = xmlGetNodePath(node); rval = NOKOGIRI_STR_NEW2(path); xmlFree(path); return rval ; } /* :nodoc: */ static VALUE add_next_sibling(VALUE self, VALUE new_sibling) { return reparent_node_with(self, new_sibling, xmlAddNextSibling) ; } /* :nodoc: */ static VALUE add_previous_sibling(VALUE self, VALUE new_sibling) { return reparent_node_with(self, new_sibling, xmlAddPrevSibling) ; } /* * call-seq: * native_write_to(io, encoding, options) * * Write this Node to +io+ with +encoding+ and +options+ */ static VALUE native_write_to( VALUE self, VALUE io, VALUE encoding, VALUE indent_string, VALUE options ) { xmlNodePtr node; const char * before_indent; xmlSaveCtxtPtr savectx; Data_Get_Struct(self, xmlNode, node); xmlIndentTreeOutput = 1; before_indent = xmlTreeIndentString; xmlTreeIndentString = StringValuePtr(indent_string); savectx = xmlSaveToIO( (xmlOutputWriteCallback)io_write_callback, (xmlOutputCloseCallback)io_close_callback, (void *)io, RTEST(encoding) ? StringValuePtr(encoding) : NULL, (int)NUM2INT(options) ); xmlSaveTree(savectx, node); xmlSaveClose(savectx); xmlTreeIndentString = before_indent; return io; } /* * call-seq: * line * * Returns the line for this Node */ static VALUE line(VALUE self) { xmlNodePtr node; Data_Get_Struct(self, xmlNode, node); return INT2NUM(xmlGetLineNo(node)); } /* * call-seq: * add_namespace_definition(prefix, href) * * Adds a namespace definition with +prefix+ using +href+ value. The result is * as if parsed XML for this node had included an attribute * 'xmlns:prefix=value'. A default namespace for this node ("xmlns=") can be * added by passing 'nil' for prefix. Namespaces added this way will not * show up in #attributes, but they will be included as an xmlns attribute * when the node is serialized to XML. */ static VALUE add_namespace_definition(VALUE self, VALUE prefix, VALUE href) { xmlNodePtr node, namespacee; xmlNsPtr ns; Data_Get_Struct(self, xmlNode, node); namespacee = node ; ns = xmlSearchNs( node->doc, node, (const xmlChar *)(NIL_P(prefix) ? NULL : StringValuePtr(prefix)) ); if(!ns) { if (node->type != XML_ELEMENT_NODE) { namespacee = node->parent; } ns = xmlNewNs( namespacee, (const xmlChar *)StringValuePtr(href), (const xmlChar *)(NIL_P(prefix) ? NULL : StringValuePtr(prefix)) ); } if (!ns) return Qnil ; if(NIL_P(prefix) || node != namespacee) xmlSetNs(node, ns); return Nokogiri_wrap_xml_namespace(node->doc, ns); } /* * call-seq: * new(name, document) * * Create a new node with +name+ sharing GC lifecycle with +document+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr doc; xmlNodePtr node; VALUE name; VALUE document; VALUE rest; VALUE rb_node; rb_scan_args(argc, argv, "2*", &name, &document, &rest); Data_Get_Struct(document, xmlDoc, doc); node = xmlNewNode(NULL, (xmlChar *)StringValuePtr(name)); node->doc = doc->doc; nokogiri_root_node(node); rb_node = Nokogiri_wrap_xml_node( klass == cNokogiriXmlNode ? (VALUE)NULL : klass, node ); rb_obj_call_init(rb_node, argc, argv); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } /* * call-seq: * dump_html * * Returns the Node as html. */ static VALUE dump_html(VALUE self) { xmlBufferPtr buf ; xmlNodePtr node ; VALUE html; Data_Get_Struct(self, xmlNode, node); buf = xmlBufferCreate() ; htmlNodeDump(buf, node->doc, node); html = NOKOGIRI_STR_NEW2(buf->content); xmlBufferFree(buf); return html ; } /* * call-seq: * compare(other) * * Compare this Node to +other+ with respect to their Document */ static VALUE compare(VALUE self, VALUE _other) { xmlNodePtr node, other; Data_Get_Struct(self, xmlNode, node); Data_Get_Struct(_other, xmlNode, other); return INT2NUM((long)xmlXPathCmpNodes(other, node)); } /* * call-seq: * process_xincludes(options) * * Loads and substitutes all xinclude elements below the node. The * parser context will be initialized with +options+. */ static VALUE process_xincludes(VALUE self, VALUE options) { int rcode ; xmlNodePtr node; VALUE error_list = rb_ary_new(); Data_Get_Struct(self, xmlNode, node); xmlSetStructuredErrorFunc((void *)error_list, Nokogiri_error_array_pusher); rcode = xmlXIncludeProcessTreeFlags(node, (int)NUM2INT(options)); xmlSetStructuredErrorFunc(NULL, NULL); if (rcode < 0) { xmlErrorPtr error; error = xmlGetLastError(); if(error) rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); else rb_raise(rb_eRuntimeError, "Could not perform xinclude substitution"); } return self; } /* TODO: DOCUMENT ME */ static VALUE in_context(VALUE self, VALUE _str, VALUE _options) { xmlNodePtr node, list = 0, tmp, child_iter, node_children, doc_children; xmlNodeSetPtr set; xmlParserErrors error; VALUE doc, err; int doc_is_empty; Data_Get_Struct(self, xmlNode, node); doc = DOC_RUBY_OBJECT(node->doc); err = rb_iv_get(doc, "@errors"); doc_is_empty = (node->doc->children == NULL) ? 1 : 0; node_children = node->children; doc_children = node->doc->children; xmlSetStructuredErrorFunc((void *)err, Nokogiri_error_array_pusher); /* Twiddle global variable because of a bug in libxml2. * http://git.gnome.org/browse/libxml2/commit/?id=e20fb5a72c83cbfc8e4a8aa3943c6be8febadab7 */ #ifndef HTML_PARSE_NOIMPLIED htmlHandleOmittedElem(0); #endif /* This function adds a fake node to the child of +node+. If the parser * does not exit cleanly with XML_ERR_OK, the list is freed. This can * leave the child pointers in a bad state if they were originally empty. * * http://git.gnome.org/browse/libxml2/tree/parser.c#n13177 * */ error = xmlParseInNodeContext(node, StringValuePtr(_str), (int)RSTRING_LEN(_str), (int)NUM2INT(_options), &list); /* xmlParseInNodeContext should not mutate the original document or node, * so reassigning these pointers should be OK. The reason we're reassigning * is because if there were errors, it's possible for the child pointers * to be manipulated. */ if (error != XML_ERR_OK) { node->doc->children = doc_children; node->children = node_children; } /* make sure parent/child pointers are coherent so an unlink will work * properly (#331) */ child_iter = node->doc->children ; while (child_iter) { if (child_iter->parent != (xmlNodePtr)node->doc) child_iter->parent = (xmlNodePtr)node->doc; child_iter = child_iter->next; } #ifndef HTML_PARSE_NOIMPLIED htmlHandleOmittedElem(1); #endif xmlSetStructuredErrorFunc(NULL, NULL); /* Workaround for a libxml2 bug where a parsing error may leave a broken * node reference in node->doc->children. * This workaround is limited to when a parse error occurs, the document * went from having no children to having children, and the context node is * part of a document fragment. * https://bugzilla.gnome.org/show_bug.cgi?id=668155 */ if (error != XML_ERR_OK && doc_is_empty && node->doc->children != NULL) { child_iter = node; while (child_iter->parent) child_iter = child_iter->parent; if (child_iter->type == XML_DOCUMENT_FRAG_NODE) node->doc->children = NULL; } /* FIXME: This probably needs to handle more constants... */ switch (error) { case XML_ERR_INTERNAL_ERROR: case XML_ERR_NO_MEMORY: rb_raise(rb_eRuntimeError, "error parsing fragment (%d)", error); break; default: break; } set = xmlXPathNodeSetCreate(NULL); while (list) { tmp = list->next; list->next = NULL; xmlXPathNodeSetAddUnique(set, list); nokogiri_root_node(list); list = tmp; } return Nokogiri_wrap_xml_node_set(set, doc); } VALUE Nokogiri_wrap_xml_node(VALUE klass, xmlNodePtr node) { VALUE document = Qnil ; VALUE node_cache = Qnil ; VALUE rb_node = Qnil ; nokogiriTuplePtr node_has_a_document; xmlDocPtr doc; void (*mark_method)(xmlNodePtr) = NULL ; assert(node); if(node->type == XML_DOCUMENT_NODE || node->type == XML_HTML_DOCUMENT_NODE) return DOC_RUBY_OBJECT(node->doc); /* It's OK if the node doesn't have a fully-realized document (as in XML::Reader). */ /* see https://github.com/sparklemotion/nokogiri/issues/95 */ /* and https://github.com/sparklemotion/nokogiri/issues/439 */ doc = node->doc; if (doc->type == XML_DOCUMENT_FRAG_NODE) doc = doc->doc; node_has_a_document = DOC_RUBY_OBJECT_TEST(doc); if(node->_private && node_has_a_document) return (VALUE)node->_private; if(!RTEST(klass)) { switch(node->type) { case XML_ELEMENT_NODE: klass = cNokogiriXmlElement; break; case XML_TEXT_NODE: klass = cNokogiriXmlText; break; case XML_ATTRIBUTE_NODE: klass = cNokogiriXmlAttr; break; case XML_ENTITY_REF_NODE: klass = cNokogiriXmlEntityReference; break; case XML_COMMENT_NODE: klass = cNokogiriXmlComment; break; case XML_DOCUMENT_FRAG_NODE: klass = cNokogiriXmlDocumentFragment; break; case XML_PI_NODE: klass = cNokogiriXmlProcessingInstruction; break; case XML_ENTITY_DECL: klass = cNokogiriXmlEntityDecl; break; case XML_CDATA_SECTION_NODE: klass = cNokogiriXmlCData; break; case XML_DTD_NODE: klass = cNokogiriXmlDtd; break; case XML_ATTRIBUTE_DECL: klass = cNokogiriXmlAttributeDecl; break; case XML_ELEMENT_DECL: klass = cNokogiriXmlElementDecl; break; default: klass = cNokogiriXmlNode; } } mark_method = node_has_a_document ? mark : NULL ; rb_node = Data_Wrap_Struct(klass, mark_method, debug_node_dealloc, node) ; node->_private = (void *)rb_node; if (node_has_a_document) { document = DOC_RUBY_OBJECT(doc); node_cache = DOC_NODE_CACHE(doc); rb_ary_push(node_cache, rb_node); rb_funcall(document, decorate, 1, rb_node); } return rb_node ; } void Nokogiri_xml_node_properties(xmlNodePtr node, VALUE attr_list) { xmlAttrPtr prop; prop = node->properties ; while (prop != NULL) { rb_ary_push(attr_list, Nokogiri_wrap_xml_node(Qnil, (xmlNodePtr)prop)); prop = prop->next ; } } VALUE cNokogiriXmlNode ; VALUE cNokogiriXmlElement ; void init_xml_node() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE klass = rb_define_class_under(xml, "Node", rb_cObject); cNokogiriXmlNode = klass; cNokogiriXmlElement = rb_define_class_under(xml, "Element", klass); rb_define_singleton_method(klass, "new", new, -1); rb_define_method(klass, "add_namespace_definition", add_namespace_definition, 2); rb_define_method(klass, "node_name", get_name, 0); rb_define_method(klass, "document", document, 0); rb_define_method(klass, "node_name=", set_name, 1); rb_define_method(klass, "parent", get_parent, 0); rb_define_method(klass, "child", child, 0); rb_define_method(klass, "first_element_child", first_element_child, 0); rb_define_method(klass, "last_element_child", last_element_child, 0); rb_define_method(klass, "children", children, 0); rb_define_method(klass, "element_children", element_children, 0); rb_define_method(klass, "next_sibling", next_sibling, 0); rb_define_method(klass, "previous_sibling", previous_sibling, 0); rb_define_method(klass, "next_element", next_element, 0); rb_define_method(klass, "previous_element", previous_element, 0); rb_define_method(klass, "node_type", node_type, 0); rb_define_method(klass, "content", get_content, 0); rb_define_method(klass, "path", path, 0); rb_define_method(klass, "key?", key_eh, 1); rb_define_method(klass, "namespaced_key?", namespaced_key_eh, 2); rb_define_method(klass, "blank?", blank_eh, 0); rb_define_method(klass, "attribute_nodes", attribute_nodes, 0); rb_define_method(klass, "attribute", attr, 1); rb_define_method(klass, "attribute_with_ns", attribute_with_ns, 2); rb_define_method(klass, "namespace", namespace, 0); rb_define_method(klass, "namespace_definitions", namespace_definitions, 0); rb_define_method(klass, "namespace_scopes", namespace_scopes, 0); rb_define_method(klass, "encode_special_chars", encode_special_chars, 1); rb_define_method(klass, "dup", duplicate_node, -1); rb_define_method(klass, "unlink", unlink_node, 0); rb_define_method(klass, "internal_subset", internal_subset, 0); rb_define_method(klass, "external_subset", external_subset, 0); rb_define_method(klass, "create_internal_subset", create_internal_subset, 3); rb_define_method(klass, "create_external_subset", create_external_subset, 3); rb_define_method(klass, "pointer_id", pointer_id, 0); rb_define_method(klass, "line", line, 0); rb_define_method(klass, "native_content=", native_content, 1); rb_define_private_method(klass, "process_xincludes", process_xincludes, 1); rb_define_private_method(klass, "in_context", in_context, 2); rb_define_private_method(klass, "add_child_node", add_child, 1); rb_define_private_method(klass, "add_previous_sibling_node", add_previous_sibling, 1); rb_define_private_method(klass, "add_next_sibling_node", add_next_sibling, 1); rb_define_private_method(klass, "replace_node", replace, 1); rb_define_private_method(klass, "dump_html", dump_html, 0); rb_define_private_method(klass, "native_write_to", native_write_to, 4); rb_define_private_method(klass, "get", get, 1); rb_define_private_method(klass, "set", set, 2); rb_define_private_method(klass, "set_namespace", set_namespace, 1); rb_define_private_method(klass, "compare", compare, 1); decorate = rb_intern("decorate"); decorate_bang = rb_intern("decorate!"); } /* vim: set noet sw=4 sws=4 */ nokogiri-1.6.1/ext/nokogiri/xslt_stylesheet.c0000644000175000017500000001640212261213762020753 0ustar boutilboutil#include #include #include #include #include VALUE xslt; int vasprintf (char **strp, const char *fmt, va_list ap); void vasprintf_free (void *p); static void mark(nokogiriXsltStylesheetTuple *wrapper) { rb_gc_mark(wrapper->func_instances); } static void dealloc(nokogiriXsltStylesheetTuple *wrapper) { xsltStylesheetPtr doc = wrapper->ss; NOKOGIRI_DEBUG_START(doc); xsltFreeStylesheet(doc); /* commented out for now. */ NOKOGIRI_DEBUG_END(doc); free(wrapper); } static void xslt_generic_error_handler(void * ctx, const char *msg, ...) { char * message; va_list args; va_start(args, msg); vasprintf(&message, msg, args); va_end(args); rb_str_cat2((VALUE)ctx, message); vasprintf_free(message); } VALUE Nokogiri_wrap_xslt_stylesheet(xsltStylesheetPtr ss) { VALUE self; nokogiriXsltStylesheetTuple *wrapper; self = Data_Make_Struct(cNokogiriXsltStylesheet, nokogiriXsltStylesheetTuple, mark, dealloc, wrapper); ss->_private = (void *)self; wrapper->ss = ss; wrapper->func_instances = rb_ary_new(); return self; } /* * call-seq: * parse_stylesheet_doc(document) * * Parse a stylesheet from +document+. */ static VALUE parse_stylesheet_doc(VALUE klass, VALUE xmldocobj) { xmlDocPtr xml, xml_cpy; VALUE errstr, exception; xsltStylesheetPtr ss ; Data_Get_Struct(xmldocobj, xmlDoc, xml); exsltRegisterAll(); errstr = rb_str_new(0, 0); xsltSetGenericErrorFunc((void *)errstr, xslt_generic_error_handler); xml_cpy = xmlCopyDoc(xml, 1); /* 1 => recursive */ ss = xsltParseStylesheetDoc(xml_cpy); xsltSetGenericErrorFunc(NULL, NULL); if (!ss) { xmlFreeDoc(xml_cpy); exception = rb_exc_new3(rb_eRuntimeError, errstr); rb_exc_raise(exception); } return Nokogiri_wrap_xslt_stylesheet(ss); } /* * call-seq: * serialize(document) * * Serialize +document+ to an xml string. */ static VALUE serialize(VALUE self, VALUE xmlobj) { xmlDocPtr xml ; nokogiriXsltStylesheetTuple *wrapper; xmlChar* doc_ptr ; int doc_len ; VALUE rval ; Data_Get_Struct(xmlobj, xmlDoc, xml); Data_Get_Struct(self, nokogiriXsltStylesheetTuple, wrapper); xsltSaveResultToString(&doc_ptr, &doc_len, xml, wrapper->ss); rval = NOKOGIRI_STR_NEW(doc_ptr, doc_len); xmlFree(doc_ptr); return rval ; } static void swallow_superfluous_xml_errors(void * userdata, xmlErrorPtr error, ...) { } /* * call-seq: * transform(document, params = []) * * Apply an XSLT stylesheet to an XML::Document. * +params+ is an array of strings used as XSLT parameters. * returns Nokogiri::XML::Document * * Example: * * doc = Nokogiri::XML(File.read(ARGV[0])) * xslt = Nokogiri::XSLT(File.read(ARGV[1])) * puts xslt.transform(doc, ['key', 'value']) * */ static VALUE transform(int argc, VALUE* argv, VALUE self) { VALUE xmldoc, paramobj, errstr, exception ; xmlDocPtr xml ; xmlDocPtr result ; nokogiriXsltStylesheetTuple *wrapper; const char** params ; long param_len, j ; int parse_error_occurred ; rb_scan_args(argc, argv, "11", &xmldoc, ¶mobj); if (NIL_P(paramobj)) { paramobj = rb_ary_new2(0L) ; } if (!rb_obj_is_kind_of(xmldoc, cNokogiriXmlDocument)) rb_raise(rb_eArgError, "argument must be a Nokogiri::XML::Document"); /* handle hashes as arguments. */ if(T_HASH == TYPE(paramobj)) { paramobj = rb_funcall(paramobj, rb_intern("to_a"), 0); paramobj = rb_funcall(paramobj, rb_intern("flatten"), 0); } Check_Type(paramobj, T_ARRAY); Data_Get_Struct(xmldoc, xmlDoc, xml); Data_Get_Struct(self, nokogiriXsltStylesheetTuple, wrapper); param_len = RARRAY_LEN(paramobj); params = calloc((size_t)param_len+1, sizeof(char*)); for (j = 0 ; j < param_len ; j++) { VALUE entry = rb_ary_entry(paramobj, j); const char * ptr = StringValuePtr(entry); params[j] = ptr; } params[param_len] = 0 ; errstr = rb_str_new(0, 0); xsltSetGenericErrorFunc((void *)errstr, xslt_generic_error_handler); xmlSetGenericErrorFunc(NULL, (xmlGenericErrorFunc)&swallow_superfluous_xml_errors); result = xsltApplyStylesheet(wrapper->ss, xml, params); free(params); xsltSetGenericErrorFunc(NULL, NULL); xmlSetGenericErrorFunc(NULL, NULL); parse_error_occurred = (Qfalse == rb_funcall(errstr, rb_intern("empty?"), 0)); if (parse_error_occurred) { exception = rb_exc_new3(rb_eRuntimeError, errstr); rb_exc_raise(exception); } return Nokogiri_wrap_xml_document((VALUE)0, result) ; } static void method_caller(xmlXPathParserContextPtr ctxt, int nargs) { VALUE handler; const char *function_name; xsltTransformContextPtr transform; const xmlChar *functionURI; transform = xsltXPathGetTransformContext(ctxt); functionURI = ctxt->context->functionURI; handler = (VALUE)xsltGetExtData(transform, functionURI); function_name = (const char*)(ctxt->context->function); Nokogiri_marshal_xpath_funcall_and_return_values(ctxt, nargs, handler, (const char*)function_name); } static void * initFunc(xsltTransformContextPtr ctxt, const xmlChar *uri) { VALUE modules = rb_iv_get(xslt, "@modules"); VALUE obj = rb_hash_aref(modules, rb_str_new2((const char *)uri)); VALUE args = { Qfalse }; VALUE methods = rb_funcall(obj, rb_intern("instance_methods"), 1, args); VALUE inst; nokogiriXsltStylesheetTuple *wrapper; int i; for(i = 0; i < RARRAY_LEN(methods); i++) { VALUE method_name = rb_obj_as_string(rb_ary_entry(methods, i)); xsltRegisterExtFunction(ctxt, (unsigned char *)StringValuePtr(method_name), uri, method_caller); } Data_Get_Struct(ctxt->style->_private, nokogiriXsltStylesheetTuple, wrapper); inst = rb_class_new_instance(0, NULL, obj); rb_ary_push(wrapper->func_instances, inst); return (void *)inst; } static void shutdownFunc(xsltTransformContextPtr ctxt, const xmlChar *uri, void *data) { nokogiriXsltStylesheetTuple *wrapper; Data_Get_Struct(ctxt->style->_private, nokogiriXsltStylesheetTuple, wrapper); rb_ary_clear(wrapper->func_instances); } /* * call-seq: * register(uri, custom_handler_class) * * Register a class that implements custom XLST transformation functions. */ static VALUE registr(VALUE self, VALUE uri, VALUE obj) { VALUE modules = rb_iv_get(self, "@modules"); if(NIL_P(modules)) rb_raise(rb_eRuntimeError, "wtf! @modules isn't set"); rb_hash_aset(modules, uri, obj); xsltRegisterExtModule((unsigned char *)StringValuePtr(uri), initFunc, shutdownFunc); return self; } VALUE cNokogiriXsltStylesheet ; void init_xslt_stylesheet() { VALUE nokogiri; VALUE klass; nokogiri = rb_define_module("Nokogiri"); xslt = rb_define_module_under(nokogiri, "XSLT"); klass = rb_define_class_under(xslt, "Stylesheet", rb_cObject); rb_iv_set(xslt, "@modules", rb_hash_new()); cNokogiriXsltStylesheet = klass; rb_define_singleton_method(klass, "parse_stylesheet_doc", parse_stylesheet_doc, 1); rb_define_singleton_method(xslt, "register", registr, 2); rb_define_method(klass, "serialize", serialize, 1); rb_define_method(klass, "transform", transform, -1); } nokogiri-1.6.1/ext/nokogiri/xml_processing_instruction.c0000644000175000017500000000246612261213762023212 0ustar boutilboutil#include /* * call-seq: * new(document, name, content) * * Create a new ProcessingInstruction element on the +document+ with +name+ * and +content+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr xml_doc; xmlNodePtr node; VALUE document; VALUE name; VALUE content; VALUE rest; VALUE rb_node; rb_scan_args(argc, argv, "3*", &document, &name, &content, &rest); Data_Get_Struct(document, xmlDoc, xml_doc); node = xmlNewDocPI( xml_doc, (const xmlChar *)StringValuePtr(name), (const xmlChar *)StringValuePtr(content) ); nokogiri_root_node(node); rb_node = Nokogiri_wrap_xml_node(klass, node); rb_obj_call_init(rb_node, argc, argv); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } VALUE cNokogiriXmlProcessingInstruction; void init_xml_processing_instruction() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); /* * ProcessingInstruction represents a ProcessingInstruction node in an xml * document. */ VALUE klass = rb_define_class_under(xml, "ProcessingInstruction", node); cNokogiriXmlProcessingInstruction = klass; rb_define_singleton_method(klass, "new", new, -1); } nokogiri-1.6.1/ext/nokogiri/html_sax_push_parser.h0000644000175000017500000000027012261213762021743 0ustar boutilboutil#ifndef NOKOGIRI_HTML_SAX_PUSH_PARSER #define NOKOGIRI_HTML_SAX_PUSH_PARSER #include void init_html_sax_push_parser(); extern VALUE cNokogiriHtmlSaxPushParser ; #endif nokogiri-1.6.1/ext/nokogiri/xml_document_fragment.c0000644000175000017500000000212112261213762022062 0ustar boutilboutil#include /* * call-seq: * new(document) * * Create a new DocumentFragment element on the +document+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr xml_doc; xmlNodePtr node; VALUE document; VALUE rest; VALUE rb_node; rb_scan_args(argc, argv, "1*", &document, &rest); Data_Get_Struct(document, xmlDoc, xml_doc); node = xmlNewDocFragment(xml_doc->doc); nokogiri_root_node(node); rb_node = Nokogiri_wrap_xml_node(klass, node); rb_obj_call_init(rb_node, argc, argv); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } VALUE cNokogiriXmlDocumentFragment; void init_xml_document_fragment() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); /* * DocumentFragment represents a DocumentFragment node in an xml document. */ VALUE klass = rb_define_class_under(xml, "DocumentFragment", node); cNokogiriXmlDocumentFragment = klass; rb_define_singleton_method(klass, "new", new, -1); } nokogiri-1.6.1/ext/nokogiri/xml_io.c0000644000175000017500000000213212261213762016772 0ustar boutilboutil#include static ID id_read, id_write; VALUE read_check(VALUE *args) { return rb_funcall(args[0], id_read, 1, args[1]); } VALUE read_failed(void) { return Qnil; } int io_read_callback(void * ctx, char * buffer, int len) { VALUE string, args[2]; size_t str_len, safe_len; args[0] = (VALUE)ctx; args[1] = INT2NUM(len); string = rb_rescue(read_check, (VALUE)args, read_failed, 0); if(NIL_P(string)) return 0; str_len = (size_t)RSTRING_LEN(string); safe_len = str_len > (size_t)len ? (size_t)len : str_len; memcpy(buffer, StringValuePtr(string), safe_len); return (int)safe_len; } VALUE write_check(VALUE *args) { return rb_funcall(args[0], id_write, 1, args[1]); } VALUE write_failed(void) { return Qnil; } int io_write_callback(void * ctx, char * buffer, int len) { VALUE args[2]; args[0] = (VALUE)ctx; args[1] = rb_str_new(buffer, (long)len); rb_rescue(write_check, (VALUE)args, write_failed, 0); return len; } int io_close_callback(void * ctx) { return 0; } void init_nokogiri_io() { id_read = rb_intern("read"); id_write = rb_intern("write"); } nokogiri-1.6.1/ext/nokogiri/html_document.h0000644000175000017500000000023712261213762020356 0ustar boutilboutil#ifndef NOKOGIRI_HTML_DOCUMENT #define NOKOGIRI_HTML_DOCUMENT #include void init_html_document(); extern VALUE cNokogiriHtmlDocument ; #endif nokogiri-1.6.1/ext/nokogiri/xml_attribute_decl.c0000644000175000017500000000262412261213762021363 0ustar boutilboutil#include /* * call-seq: * attribute_type * * The attribute_type for this AttributeDecl */ static VALUE attribute_type(VALUE self) { xmlAttributePtr node; Data_Get_Struct(self, xmlAttribute, node); return INT2NUM((long)node->atype); } /* * call-seq: * default * * The default value */ static VALUE default_value(VALUE self) { xmlAttributePtr node; Data_Get_Struct(self, xmlAttribute, node); if(node->defaultValue) return NOKOGIRI_STR_NEW2(node->defaultValue); return Qnil; } /* * call-seq: * enumeration * * An enumeration of possible values */ static VALUE enumeration(VALUE self) { xmlAttributePtr node; xmlEnumerationPtr enm; VALUE list; Data_Get_Struct(self, xmlAttribute, node); list = rb_ary_new(); enm = node->tree; while(enm) { rb_ary_push(list, NOKOGIRI_STR_NEW2(enm->name)); enm = enm->next; } return list; } VALUE cNokogiriXmlAttributeDecl; void init_xml_attribute_decl() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); VALUE klass = rb_define_class_under(xml, "AttributeDecl", node); cNokogiriXmlAttributeDecl = klass; rb_define_method(klass, "attribute_type", attribute_type, 0); rb_define_method(klass, "default", default_value, 0); rb_define_method(klass, "enumeration", enumeration, 0); } nokogiri-1.6.1/ext/nokogiri/xml_attr.h0000644000175000017500000000021112261213762017336 0ustar boutilboutil#ifndef NOKOGIRI_XML_ATTR #define NOKOGIRI_XML_ATTR #include void init_xml_attr(); extern VALUE cNokogiriXmlAttr; #endif nokogiri-1.6.1/ext/nokogiri/xml_relax_ng.h0000644000175000017500000000023012261213762020164 0ustar boutilboutil#ifndef NOKOGIRI_XML_RELAX_NG #define NOKOGIRI_XML_RELAX_NG #include void init_xml_relax_ng(); extern VALUE cNokogiriXmlRelaxNG; #endif nokogiri-1.6.1/ext/nokogiri/xslt_stylesheet.h0000644000175000017500000000044412261213762020757 0ustar boutilboutil#ifndef NOKOGIRI_XSLT_STYLESHEET #define NOKOGIRI_XSLT_STYLESHEET #include void init_xslt_stylesheet(); extern VALUE cNokogiriXsltStylesheet ; typedef struct _nokogiriXsltStylesheetTuple { xsltStylesheetPtr ss; VALUE func_instances; } nokogiriXsltStylesheetTuple; #endif nokogiri-1.6.1/ext/nokogiri/xml_syntax_error.h0000644000175000017500000000056412261213762021136 0ustar boutilboutil#ifndef NOKOGIRI_XML_SYNTAX_ERROR #define NOKOGIRI_XML_SYNTAX_ERROR #include void init_xml_syntax_error(); VALUE Nokogiri_wrap_xml_syntax_error(VALUE klass, xmlErrorPtr error); void Nokogiri_error_array_pusher(void * ctx, xmlErrorPtr error); NORETURN(void Nokogiri_error_raise(void * ctx, xmlErrorPtr error)); extern VALUE cNokogiriXmlSyntaxError; #endif nokogiri-1.6.1/ext/nokogiri/xml_document.c0000644000175000017500000003503712261213762020213 0ustar boutilboutil#include static int dealloc_node_i(xmlNodePtr key, xmlNodePtr node, xmlDocPtr doc) { switch(node->type) { case XML_ATTRIBUTE_NODE: xmlFreePropList((xmlAttrPtr)node); break; case XML_NAMESPACE_DECL: xmlFree(node); break; default: if(node->parent == NULL) { xmlAddChild((xmlNodePtr)doc, node); } } return ST_CONTINUE; } static void dealloc(xmlDocPtr doc) { xmlDeregisterNodeFunc func; st_table *node_hash; NOKOGIRI_DEBUG_START(doc); func = xmlDeregisterNodeDefault(NULL); node_hash = DOC_UNLINKED_NODE_HASH(doc); st_foreach(node_hash, dealloc_node_i, (st_data_t)doc); st_free_table(node_hash); free(doc->_private); doc->_private = NULL; xmlFreeDoc(doc); xmlDeregisterNodeDefault(func); NOKOGIRI_DEBUG_END(doc); } static void recursively_remove_namespaces_from_node(xmlNodePtr node) { xmlNodePtr child ; xmlAttrPtr property ; xmlSetNs(node, NULL); for (child = node->children ; child ; child = child->next) recursively_remove_namespaces_from_node(child); if (((node->type == XML_ELEMENT_NODE) || (node->type == XML_XINCLUDE_START) || (node->type == XML_XINCLUDE_END)) && node->nsDef) { xmlFreeNsList(node->nsDef); node->nsDef = NULL; } if (node->type == XML_ELEMENT_NODE && node->properties != NULL) { property = node->properties ; while (property != NULL) { if (property->ns) property->ns = NULL ; property = property->next ; } } } /* * call-seq: * url * * Get the url name for this document. */ static VALUE url(VALUE self) { xmlDocPtr doc; Data_Get_Struct(self, xmlDoc, doc); if(doc->URL) return NOKOGIRI_STR_NEW2(doc->URL); return Qnil; } /* * call-seq: * root= * * Set the root element on this document */ static VALUE set_root(VALUE self, VALUE root) { xmlDocPtr doc; xmlNodePtr new_root; xmlNodePtr old_root; Data_Get_Struct(self, xmlDoc, doc); old_root = NULL; if(NIL_P(root)) { old_root = xmlDocGetRootElement(doc); if(old_root) { xmlUnlinkNode(old_root); nokogiri_root_node(old_root); } return root; } Data_Get_Struct(root, xmlNode, new_root); /* If the new root's document is not the same as the current document, * then we need to dup the node in to this document. */ if(new_root->doc != doc) { old_root = xmlDocGetRootElement(doc); if (!(new_root = xmlDocCopyNode(new_root, doc, 1))) { rb_raise(rb_eRuntimeError, "Could not reparent node (xmlDocCopyNode)"); } } xmlDocSetRootElement(doc, new_root); if(old_root) nokogiri_root_node(old_root); return root; } /* * call-seq: * root * * Get the root node for this document. */ static VALUE root(VALUE self) { xmlDocPtr doc; xmlNodePtr root; Data_Get_Struct(self, xmlDoc, doc); root = xmlDocGetRootElement(doc); if(!root) return Qnil; return Nokogiri_wrap_xml_node(Qnil, root) ; } /* * call-seq: * encoding= encoding * * Set the encoding string for this Document */ static VALUE set_encoding(VALUE self, VALUE encoding) { xmlDocPtr doc; Data_Get_Struct(self, xmlDoc, doc); if (doc->encoding) free((char *) doc->encoding); /* this may produce a gcc cast warning */ doc->encoding = xmlStrdup((xmlChar *)StringValuePtr(encoding)); return encoding; } /* * call-seq: * encoding * * Get the encoding for this Document */ static VALUE encoding(VALUE self) { xmlDocPtr doc; Data_Get_Struct(self, xmlDoc, doc); if(!doc->encoding) return Qnil; return NOKOGIRI_STR_NEW2(doc->encoding); } /* * call-seq: * version * * Get the XML version for this Document */ static VALUE version(VALUE self) { xmlDocPtr doc; Data_Get_Struct(self, xmlDoc, doc); if(!doc->version) return Qnil; return NOKOGIRI_STR_NEW2(doc->version); } /* * call-seq: * read_io(io, url, encoding, options) * * Create a new document from an IO object */ static VALUE read_io( VALUE klass, VALUE io, VALUE url, VALUE encoding, VALUE options ) { const char * c_url = NIL_P(url) ? NULL : StringValuePtr(url); const char * c_enc = NIL_P(encoding) ? NULL : StringValuePtr(encoding); VALUE error_list = rb_ary_new(); VALUE document; xmlDocPtr doc; xmlResetLastError(); xmlSetStructuredErrorFunc((void *)error_list, Nokogiri_error_array_pusher); doc = xmlReadIO( (xmlInputReadCallback)io_read_callback, (xmlInputCloseCallback)io_close_callback, (void *)io, c_url, c_enc, (int)NUM2INT(options) ); xmlSetStructuredErrorFunc(NULL, NULL); if(doc == NULL) { xmlErrorPtr error; xmlFreeDoc(doc); error = xmlGetLastError(); if(error) rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } document = Nokogiri_wrap_xml_document(klass, doc); rb_iv_set(document, "@errors", error_list); return document; } /* * call-seq: * read_memory(string, url, encoding, options) * * Create a new document from a String */ static VALUE read_memory( VALUE klass, VALUE string, VALUE url, VALUE encoding, VALUE options ) { const char * c_buffer = StringValuePtr(string); const char * c_url = NIL_P(url) ? NULL : StringValuePtr(url); const char * c_enc = NIL_P(encoding) ? NULL : StringValuePtr(encoding); int len = (int)RSTRING_LEN(string); VALUE error_list = rb_ary_new(); VALUE document; xmlDocPtr doc; xmlResetLastError(); xmlSetStructuredErrorFunc((void *)error_list, Nokogiri_error_array_pusher); doc = xmlReadMemory(c_buffer, len, c_url, c_enc, (int)NUM2INT(options)); xmlSetStructuredErrorFunc(NULL, NULL); if(doc == NULL) { xmlErrorPtr error; xmlFreeDoc(doc); error = xmlGetLastError(); if(error) rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); else rb_raise(rb_eRuntimeError, "Could not parse document"); return Qnil; } document = Nokogiri_wrap_xml_document(klass, doc); rb_iv_set(document, "@errors", error_list); return document; } /* * call-seq: * dup * * Copy this Document. An optional depth may be passed in, but it defaults * to a deep copy. 0 is a shallow copy, 1 is a deep copy. */ static VALUE duplicate_node(int argc, VALUE *argv, VALUE self) { xmlDocPtr doc, dup; VALUE level; if(rb_scan_args(argc, argv, "01", &level) == 0) level = INT2NUM((long)1); Data_Get_Struct(self, xmlDoc, doc); dup = xmlCopyDoc(doc, (int)NUM2INT(level)); if(dup == NULL) return Qnil; dup->type = doc->type; return Nokogiri_wrap_xml_document(rb_obj_class(self), dup); } /* * call-seq: * new(version = default) * * Create a new document with +version+ (defaults to "1.0") */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr doc; VALUE version, rest, rb_doc ; rb_scan_args(argc, argv, "0*", &rest); version = rb_ary_entry(rest, (long)0); if (NIL_P(version)) version = rb_str_new2("1.0"); doc = xmlNewDoc((xmlChar *)StringValuePtr(version)); rb_doc = Nokogiri_wrap_xml_document(klass, doc); rb_obj_call_init(rb_doc, argc, argv); return rb_doc ; } /* * call-seq: * remove_namespaces! * * Remove all namespaces from all nodes in the document. * * This could be useful for developers who either don't understand namespaces * or don't care about them. * * The following example shows a use case, and you can decide for yourself * whether this is a good thing or not: * * doc = Nokogiri::XML <<-EOXML * * * Michelin Model XGV * * * I'm a bicycle tire! * * * EOXML * * doc.xpath("//tire").to_s # => "" * doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => "Michelin Model XGV" * doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => "I'm a bicycle tire!" * * doc.remove_namespaces! * * doc.xpath("//tire").to_s # => "Michelin Model XGVI'm a bicycle tire!" * doc.xpath("//part:tire", "part" => "http://general-motors.com/").to_s # => "" * doc.xpath("//part:tire", "part" => "http://schwinn.com/").to_s # => "" * * For more information on why this probably is *not* a good thing in general, * please direct your browser to * http://tenderlovemaking.com/2009/04/23/namespaces-in-xml.html */ VALUE remove_namespaces_bang(VALUE self) { xmlDocPtr doc ; Data_Get_Struct(self, xmlDoc, doc); recursively_remove_namespaces_from_node((xmlNodePtr)doc); return self; } /* call-seq: doc.create_entity(name, type, external_id, system_id, content) * * Create a new entity named +name+. * * +type+ is an integer representing the type of entity to be created, and it * defaults to Nokogiri::XML::EntityDecl::INTERNAL_GENERAL. See * the constants on Nokogiri::XML::EntityDecl for more information. * * +external_id+, +system_id+, and +content+ set the External ID, System ID, * and content respectively. All of these parameters are optional. */ static VALUE create_entity(int argc, VALUE *argv, VALUE self) { VALUE name; VALUE type; VALUE external_id; VALUE system_id; VALUE content; xmlEntityPtr ptr; xmlDocPtr doc ; Data_Get_Struct(self, xmlDoc, doc); rb_scan_args(argc, argv, "14", &name, &type, &external_id, &system_id, &content); xmlResetLastError(); ptr = xmlAddDocEntity( doc, (xmlChar *)(NIL_P(name) ? NULL : StringValuePtr(name)), (int) (NIL_P(type) ? XML_INTERNAL_GENERAL_ENTITY : NUM2INT(type)), (xmlChar *)(NIL_P(external_id) ? NULL : StringValuePtr(external_id)), (xmlChar *)(NIL_P(system_id) ? NULL : StringValuePtr(system_id)), (xmlChar *)(NIL_P(content) ? NULL : StringValuePtr(content)) ); if(NULL == ptr) { xmlErrorPtr error = xmlGetLastError(); if(error) rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); else rb_raise(rb_eRuntimeError, "Could not create entity"); return Qnil; } return Nokogiri_wrap_xml_node(cNokogiriXmlEntityDecl, (xmlNodePtr)ptr); } static int block_caller(void * ctx, xmlNodePtr _node, xmlNodePtr _parent) { VALUE block; VALUE node; VALUE parent; VALUE ret; if(_node->type == XML_NAMESPACE_DECL){ node = Nokogiri_wrap_xml_namespace(_parent->doc, (xmlNsPtr) _node); } else{ node = Nokogiri_wrap_xml_node(Qnil, _node); } parent = _parent ? Nokogiri_wrap_xml_node(Qnil, _parent) : Qnil; block = (VALUE)ctx; ret = rb_funcall(block, rb_intern("call"), 2, node, parent); if(Qfalse == ret || Qnil == ret) return 0; return 1; } /* call-seq: * doc.canonicalize(mode=XML_C14N_1_0,inclusive_namespaces=nil,with_comments=false) * doc.canonicalize { |obj, parent| ... } * * Canonicalize a document and return the results. Takes an optional block * that takes two parameters: the +obj+ and that node's +parent+. * The +obj+ will be either a Nokogiri::XML::Node, or a Nokogiri::XML::Namespace * The block must return a non-nil, non-false value if the +obj+ passed in * should be included in the canonicalized document. */ static VALUE canonicalize(int argc, VALUE* argv, VALUE self) { VALUE mode; VALUE incl_ns; VALUE with_comments; xmlChar **ns; long ns_len, i; xmlDocPtr doc; xmlOutputBufferPtr buf; xmlC14NIsVisibleCallback cb = NULL; void * ctx = NULL; VALUE rb_cStringIO; VALUE io; rb_scan_args(argc, argv, "03", &mode, &incl_ns, &with_comments); Data_Get_Struct(self, xmlDoc, doc); rb_cStringIO = rb_const_get_at(rb_cObject, rb_intern("StringIO")); io = rb_class_new_instance(0, 0, rb_cStringIO); buf = xmlAllocOutputBuffer(NULL); buf->writecallback = (xmlOutputWriteCallback)io_write_callback; buf->closecallback = (xmlOutputCloseCallback)io_close_callback; buf->context = (void *)io; if(rb_block_given_p()) { cb = block_caller; ctx = (void *)rb_block_proc(); } if(NIL_P(incl_ns)){ ns = NULL; } else{ ns_len = RARRAY_LEN(incl_ns); ns = calloc((size_t)ns_len+1, sizeof(xmlChar *)); for (i = 0 ; i < ns_len ; i++) { VALUE entry = rb_ary_entry(incl_ns, i); const char * ptr = StringValuePtr(entry); ns[i] = (xmlChar*) ptr; } } xmlC14NExecute(doc, cb, ctx, (int) (NIL_P(mode) ? 0 : NUM2INT(mode)), ns, (int) (NIL_P(with_comments) ? 0 : 1), buf); xmlOutputBufferClose(buf); return rb_funcall(io, rb_intern("string"), 0); } VALUE cNokogiriXmlDocument ; void init_xml_document() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); /* * Nokogiri::XML::Document wraps an xml document. */ VALUE klass = rb_define_class_under(xml, "Document", node); cNokogiriXmlDocument = klass; rb_define_singleton_method(klass, "read_memory", read_memory, 4); rb_define_singleton_method(klass, "read_io", read_io, 4); rb_define_singleton_method(klass, "new", new, -1); rb_define_method(klass, "root", root, 0); rb_define_method(klass, "root=", set_root, 1); rb_define_method(klass, "encoding", encoding, 0); rb_define_method(klass, "encoding=", set_encoding, 1); rb_define_method(klass, "version", version, 0); rb_define_method(klass, "canonicalize", canonicalize, -1); rb_define_method(klass, "dup", duplicate_node, -1); rb_define_method(klass, "url", url, 0); rb_define_method(klass, "create_entity", create_entity, -1); rb_define_method(klass, "remove_namespaces!", remove_namespaces_bang, 0); } /* this takes klass as a param because it's used for HtmlDocument, too. */ VALUE Nokogiri_wrap_xml_document(VALUE klass, xmlDocPtr doc) { nokogiriTuplePtr tuple = (nokogiriTuplePtr)malloc(sizeof(nokogiriTuple)); VALUE rb_doc = Data_Wrap_Struct( klass ? klass : cNokogiriXmlDocument, 0, dealloc, doc ); VALUE cache = rb_ary_new(); rb_iv_set(rb_doc, "@decorators", Qnil); rb_iv_set(rb_doc, "@node_cache", cache); tuple->doc = rb_doc; tuple->unlinkedNodes = st_init_numtable_with_size(128); tuple->node_cache = cache; doc->_private = tuple ; rb_obj_call_init(rb_doc, 0, NULL); return rb_doc ; } nokogiri-1.6.1/ext/nokogiri/xml_dtd.h0000644000175000017500000000020612261213762017143 0ustar boutilboutil#ifndef NOKOGIRI_XML_DTD #define NOKOGIRI_XML_DTD #include extern VALUE cNokogiriXmlDtd; void init_xml_dtd(); #endif nokogiri-1.6.1/ext/nokogiri/xml_comment.c0000644000175000017500000000227312261213762020033 0ustar boutilboutil#include /* * call-seq: * new(document, content) * * Create a new Comment element on the +document+ with +content+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr xml_doc; xmlNodePtr node; VALUE document; VALUE content; VALUE rest; VALUE rb_node; rb_scan_args(argc, argv, "2*", &document, &content, &rest); Data_Get_Struct(document, xmlDoc, xml_doc); node = xmlNewDocComment( xml_doc, (const xmlChar *)StringValuePtr(content) ); rb_node = Nokogiri_wrap_xml_node(klass, node); rb_obj_call_init(rb_node, argc, argv); nokogiri_root_node(node); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } VALUE cNokogiriXmlComment; void init_xml_comment() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); VALUE char_data = rb_define_class_under(xml, "CharacterData", node); /* * Comment represents a comment node in an xml document. */ VALUE klass = rb_define_class_under(xml, "Comment", char_data); cNokogiriXmlComment = klass; rb_define_singleton_method(klass, "new", new, -1); } nokogiri-1.6.1/ext/nokogiri/xml_element_content.c0000644000175000017500000000504312261213762021552 0ustar boutilboutil#include VALUE cNokogiriXmlElementContent; /* * call-seq: * name * * Get the require element +name+ */ static VALUE get_name(VALUE self) { xmlElementContentPtr elem; Data_Get_Struct(self, xmlElementContent, elem); if(!elem->name) return Qnil; return NOKOGIRI_STR_NEW2(elem->name); } /* * call-seq: * type * * Get the element content +type+. Possible values are PCDATA, ELEMENT, SEQ, * or OR. */ static VALUE get_type(VALUE self) { xmlElementContentPtr elem; Data_Get_Struct(self, xmlElementContent, elem); return INT2NUM((long)elem->type); } /* * call-seq: * c1 * * Get the first child. */ static VALUE get_c1(VALUE self) { xmlElementContentPtr elem; Data_Get_Struct(self, xmlElementContent, elem); if(!elem->c1) return Qnil; return Nokogiri_wrap_element_content(rb_iv_get(self, "@document"), elem->c1); } /* * call-seq: * c2 * * Get the first child. */ static VALUE get_c2(VALUE self) { xmlElementContentPtr elem; Data_Get_Struct(self, xmlElementContent, elem); if(!elem->c2) return Qnil; return Nokogiri_wrap_element_content(rb_iv_get(self, "@document"), elem->c2); } /* * call-seq: * occur * * Get the element content +occur+ flag. Possible values are ONCE, OPT, MULT * or PLUS. */ static VALUE get_occur(VALUE self) { xmlElementContentPtr elem; Data_Get_Struct(self, xmlElementContent, elem); return INT2NUM((long)elem->ocur); } /* * call-seq: * prefix * * Get the element content namespace +prefix+. */ static VALUE get_prefix(VALUE self) { xmlElementContentPtr elem; Data_Get_Struct(self, xmlElementContent, elem); if(!elem->prefix) return Qnil; return NOKOGIRI_STR_NEW2(elem->prefix); } VALUE Nokogiri_wrap_element_content(VALUE doc, xmlElementContentPtr element) { VALUE elem = Data_Wrap_Struct(cNokogiriXmlElementContent, 0, 0, element); /* Setting the document is necessary so that this does not get GC'd until */ /* the document is GC'd */ rb_iv_set(elem, "@document", doc); return elem; } void init_xml_element_content() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE klass = rb_define_class_under(xml, "ElementContent", rb_cObject); cNokogiriXmlElementContent = klass; rb_define_method(klass, "name", get_name, 0); rb_define_method(klass, "type", get_type, 0); rb_define_method(klass, "occur", get_occur, 0); rb_define_method(klass, "prefix", get_prefix, 0); rb_define_private_method(klass, "c1", get_c1, 0); rb_define_private_method(klass, "c2", get_c2, 0); } nokogiri-1.6.1/ext/nokogiri/xml_schema.h0000644000175000017500000000022112261213762017625 0ustar boutilboutil#ifndef NOKOGIRI_XML_SCHEMA #define NOKOGIRI_XML_SCHEMA #include void init_xml_schema(); extern VALUE cNokogiriXmlSchema; #endif nokogiri-1.6.1/ext/nokogiri/xml_document_fragment.h0000644000175000017500000000027512261213762022077 0ustar boutilboutil#ifndef NOKOGIRI_XML_DOCUMENT_FRAGMENT #define NOKOGIRI_XML_DOCUMENT_FRAGMENT #include void init_xml_document_fragment(); extern VALUE cNokogiriXmlDocumentFragment; #endif nokogiri-1.6.1/ext/nokogiri/depend0000644000175000017500000005031612261213762016530 0ustar boutilboutilhtml_document.o: html_document.c html_document.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h html_element_description.o: html_element_description.c \ html_element_description.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h xml_namespace.h \ xml_encoding_handler.h html_entity_lookup.o: html_entity_lookup.c html_entity_lookup.h \ nokogiri.h xml_io.h xml_document.h html_document.h xml_node.h \ xml_text.h xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h html_sax_parser_context.o: html_sax_parser_context.c \ html_sax_parser_context.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h xslt_stylesheet.h xml_syntax_error.h xml_schema.h \ xml_relax_ng.h html_element_description.h xml_namespace.h \ xml_encoding_handler.h nokogiri.o: nokogiri.c nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_attr.o: xml_attr.c xml_attr.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_attribute_decl.o: xml_attribute_decl.c xml_attribute_decl.h \ nokogiri.h xml_io.h xml_document.h html_entity_lookup.h \ html_document.h xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_element_decl.h xml_entity_decl.h xml_xpath_context.h \ xml_element_content.h xml_sax_parser_context.h xml_sax_parser.h \ xml_sax_push_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_cdata.o: xml_cdata.c xml_cdata.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h html_document.h xml_node.h \ xml_text.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_comment.o: xml_comment.c xml_comment.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h html_document.h xml_node.h \ xml_text.h xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_node_set.h \ xml_dtd.h xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_document.o: xml_document.c xml_document.h nokogiri.h xml_io.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_document_fragment.o: xml_document_fragment.c \ xml_document_fragment.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_dtd.o: xml_dtd.c xml_dtd.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_element_content.o: xml_element_content.c xml_element_content.h \ nokogiri.h xml_io.h xml_document.h html_entity_lookup.h \ html_document.h xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_sax_parser_context.h xml_sax_parser.h \ xml_sax_push_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_element_decl.o: xml_element_decl.c xml_element_decl.h nokogiri.h \ xml_io.h xml_document.h html_entity_lookup.h html_document.h \ xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_entity_decl.h xml_xpath_context.h \ xml_element_content.h xml_sax_parser_context.h xml_sax_parser.h \ xml_sax_push_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_encoding_handler.o: xml_encoding_handler.c xml_encoding_handler.h \ nokogiri.h xml_io.h xml_document.h html_entity_lookup.h \ html_document.h xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_entity_decl.o: xml_entity_decl.c xml_entity_decl.h nokogiri.h \ xml_io.h xml_document.h html_entity_lookup.h html_document.h \ xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_xpath_context.h \ xml_element_content.h xml_sax_parser_context.h xml_sax_parser.h \ xml_sax_push_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_entity_reference.o: xml_entity_reference.c xml_entity_reference.h \ nokogiri.h xml_io.h xml_document.h html_entity_lookup.h \ html_document.h xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_io.o: xml_io.c xml_io.h nokogiri.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_namespace.o: xml_namespace.c xml_namespace.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h html_document.h xml_node.h \ xml_text.h xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_encoding_handler.h xml_node.o: xml_node.c xml_node.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_text.h xml_cdata.h \ xml_attr.h xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_node_set.o: xml_node_set.c xml_node_set.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h html_document.h xml_node.h \ xml_text.h xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_dtd.h xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_processing_instruction.o: xml_processing_instruction.c \ xml_processing_instruction.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_entity_reference.h xml_document_fragment.h \ xml_comment.h xml_node_set.h xml_dtd.h xml_attribute_decl.h \ xml_element_decl.h xml_entity_decl.h xml_xpath_context.h \ xml_element_content.h xml_sax_parser_context.h xml_sax_parser.h \ xml_sax_push_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_reader.o: xml_reader.c xml_reader.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h html_document.h xml_node.h \ xml_text.h xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_relax_ng.o: xml_relax_ng.c xml_relax_ng.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h html_document.h xml_node.h \ xml_text.h xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_schema.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_sax_parser.o: xml_sax_parser.c xml_sax_parser.h nokogiri.h \ xml_io.h xml_document.h html_entity_lookup.h html_document.h \ xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_push_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_sax_parser_context.o: xml_sax_parser_context.c \ xml_sax_parser_context.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_text.h \ xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_sax_push_parser.o: xml_sax_push_parser.c xml_sax_push_parser.h \ nokogiri.h xml_io.h xml_document.h html_entity_lookup.h \ html_document.h xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xml_schema.o: xml_schema.c xml_schema.h nokogiri.h xml_io.h \ xml_document.h html_entity_lookup.h html_document.h xml_node.h \ xml_text.h xml_cdata.h xml_attr.h xml_processing_instruction.h \ xml_entity_reference.h xml_document_fragment.h xml_comment.h \ xml_node_set.h xml_dtd.h xml_attribute_decl.h xml_element_decl.h \ xml_entity_decl.h xml_xpath_context.h xml_element_content.h \ xml_sax_parser_context.h xml_sax_parser.h xml_sax_push_parser.h \ xml_reader.h html_sax_parser_context.h xslt_stylesheet.h \ xml_syntax_error.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_syntax_error.o: xml_syntax_error.c xml_syntax_error.h nokogiri.h \ xml_io.h xml_document.h html_entity_lookup.h html_document.h \ xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_schema.h \ xml_relax_ng.h html_element_description.h xml_namespace.h \ xml_encoding_handler.h xml_text.o: xml_text.c xml_text.h nokogiri.h xml_io.h xml_document.h \ html_entity_lookup.h html_document.h xml_node.h xml_cdata.h \ xml_attr.h xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xslt_stylesheet.h xml_syntax_error.h \ xml_schema.h xml_relax_ng.h html_element_description.h \ xml_namespace.h xml_encoding_handler.h xml_xpath_context.o: xml_xpath_context.c xml_xpath_context.h \ nokogiri.h xml_io.h xml_document.h html_entity_lookup.h \ html_document.h xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_element_content.h xml_sax_parser_context.h xml_sax_parser.h \ xml_sax_push_parser.h xml_reader.h html_sax_parser_context.h \ xslt_stylesheet.h xml_syntax_error.h xml_schema.h xml_relax_ng.h \ html_element_description.h xml_namespace.h xml_encoding_handler.h xslt_stylesheet.o: xslt_stylesheet.c xslt_stylesheet.h nokogiri.h \ xml_io.h xml_document.h html_entity_lookup.h html_document.h \ xml_node.h xml_text.h xml_cdata.h xml_attr.h \ xml_processing_instruction.h xml_entity_reference.h \ xml_document_fragment.h xml_comment.h xml_node_set.h xml_dtd.h \ xml_attribute_decl.h xml_element_decl.h xml_entity_decl.h \ xml_xpath_context.h xml_element_content.h xml_sax_parser_context.h \ xml_sax_parser.h xml_sax_push_parser.h xml_reader.h \ html_sax_parser_context.h xml_syntax_error.h xml_schema.h \ xml_relax_ng.h html_element_description.h xml_namespace.h \ xml_encoding_handler.h nokogiri-1.6.1/ext/nokogiri/xml_syntax_error.c0000644000175000017500000000312612261213762021126 0ustar boutilboutil#include void Nokogiri_error_array_pusher(void * ctx, xmlErrorPtr error) { VALUE list = (VALUE)ctx; rb_ary_push(list, Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); } void Nokogiri_error_raise(void * ctx, xmlErrorPtr error) { rb_exc_raise(Nokogiri_wrap_xml_syntax_error((VALUE)NULL, error)); } VALUE Nokogiri_wrap_xml_syntax_error(VALUE klass, xmlErrorPtr error) { VALUE msg, e; if(!klass) klass = cNokogiriXmlSyntaxError; msg = (error && error->message) ? NOKOGIRI_STR_NEW2(error->message) : Qnil; e = rb_class_new_instance( 1, &msg, klass ); if (error) { rb_iv_set(e, "@domain", INT2NUM(error->domain)); rb_iv_set(e, "@code", INT2NUM(error->code)); rb_iv_set(e, "@level", INT2NUM((short)error->level)); rb_iv_set(e, "@file", RBSTR_OR_QNIL(error->file)); rb_iv_set(e, "@line", INT2NUM(error->line)); rb_iv_set(e, "@str1", RBSTR_OR_QNIL(error->str1)); rb_iv_set(e, "@str2", RBSTR_OR_QNIL(error->str2)); rb_iv_set(e, "@str3", RBSTR_OR_QNIL(error->str3)); rb_iv_set(e, "@int1", INT2NUM(error->int1)); rb_iv_set(e, "@column", INT2NUM(error->int2)); } return e; } VALUE cNokogiriXmlSyntaxError; void init_xml_syntax_error() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); /* * The XML::SyntaxError is raised on parse errors */ VALUE syntax_error_mommy = rb_define_class_under(nokogiri, "SyntaxError", rb_eStandardError); VALUE klass = rb_define_class_under(xml, "SyntaxError", syntax_error_mommy); cNokogiriXmlSyntaxError = klass; } nokogiri-1.6.1/ext/nokogiri/xml_text.c0000644000175000017500000000216612261213762017356 0ustar boutilboutil#include /* * call-seq: * new(content, document) * * Create a new Text element on the +document+ with +content+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr doc; xmlNodePtr node; VALUE string; VALUE document; VALUE rest; VALUE rb_node; rb_scan_args(argc, argv, "2*", &string, &document, &rest); Data_Get_Struct(document, xmlDoc, doc); node = xmlNewText((xmlChar *)StringValuePtr(string)); node->doc = doc->doc; nokogiri_root_node(node); rb_node = Nokogiri_wrap_xml_node(klass, node) ; rb_obj_call_init(rb_node, argc, argv); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } VALUE cNokogiriXmlText ; void init_xml_text() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); /* */ VALUE node = rb_define_class_under(xml, "Node", rb_cObject); VALUE char_data = rb_define_class_under(xml, "CharacterData", node); /* * Wraps Text nodes. */ VALUE klass = rb_define_class_under(xml, "Text", char_data); cNokogiriXmlText = klass; rb_define_singleton_method(klass, "new", new, -1); } nokogiri-1.6.1/ext/nokogiri/nokogiri.c0000644000175000017500000000745612261213762017342 0ustar boutilboutil#include VALUE mNokogiri ; VALUE mNokogiriXml ; VALUE mNokogiriHtml ; VALUE mNokogiriXslt ; VALUE mNokogiriXmlSax ; VALUE mNokogiriHtmlSax ; #ifdef USE_INCLUDED_VASPRINTF /* * I srsly hate windows. it doesn't have vasprintf. * Thank you Geoffroy Couprie for this implementation of vasprintf! */ int vasprintf (char **strp, const char *fmt, va_list ap) { int len = vsnprintf (NULL, 0, fmt, ap) + 1; char *res = (char *)malloc((unsigned int)len); if (res == NULL) return -1; *strp = res; return vsnprintf(res, (unsigned int)len, fmt, ap); } #endif #ifdef USING_SYSTEM_ALLOCATOR_LIBRARY /* Ruby Enterprise Edition with tcmalloc */ void vasprintf_free (void *p) { system_free(p); } #else void vasprintf_free (void *p) { free(p); } #endif #ifdef HAVE_RUBY_UTIL_H #include "ruby/util.h" #else #ifndef __MACRUBY__ #include "util.h" #endif #endif void nokogiri_root_node(xmlNodePtr node) { xmlDocPtr doc; nokogiriTuplePtr tuple; doc = node->doc; if (doc->type == XML_DOCUMENT_FRAG_NODE) doc = doc->doc; tuple = (nokogiriTuplePtr)doc->_private; st_insert(tuple->unlinkedNodes, (st_data_t)node, (st_data_t)node); } void nokogiri_root_nsdef(xmlNsPtr ns, xmlDocPtr doc) { nokogiriTuplePtr tuple; if (doc->type == XML_DOCUMENT_FRAG_NODE) doc = doc->doc; tuple = (nokogiriTuplePtr)doc->_private; st_insert(tuple->unlinkedNodes, (st_data_t)ns, (st_data_t)ns); } void Init_nokogiri() { #ifndef __MACRUBY__ xmlMemSetup( (xmlFreeFunc)ruby_xfree, (xmlMallocFunc)ruby_xmalloc, (xmlReallocFunc)ruby_xrealloc, ruby_strdup ); #endif mNokogiri = rb_define_module("Nokogiri"); mNokogiriXml = rb_define_module_under(mNokogiri, "XML"); mNokogiriHtml = rb_define_module_under(mNokogiri, "HTML"); mNokogiriXslt = rb_define_module_under(mNokogiri, "XSLT"); mNokogiriXmlSax = rb_define_module_under(mNokogiriXml, "SAX"); mNokogiriHtmlSax = rb_define_module_under(mNokogiriHtml, "SAX"); rb_const_set( mNokogiri, rb_intern("LIBXML_VERSION"), NOKOGIRI_STR_NEW2(LIBXML_DOTTED_VERSION) ); rb_const_set( mNokogiri, rb_intern("LIBXML_PARSER_VERSION"), NOKOGIRI_STR_NEW2(xmlParserVersion) ); #ifdef NOKOGIRI_USE_PACKAGED_LIBRARIES rb_const_set(mNokogiri, rb_intern("NOKOGIRI_USE_PACKAGED_LIBRARIES"), Qtrue); rb_const_set(mNokogiri, rb_intern("NOKOGIRI_LIBXML2_PATH"), NOKOGIRI_STR_NEW2(NOKOGIRI_LIBXML2_PATH)); rb_const_set(mNokogiri, rb_intern("NOKOGIRI_LIBXSLT_PATH"), NOKOGIRI_STR_NEW2(NOKOGIRI_LIBXSLT_PATH)); #else rb_const_set(mNokogiri, rb_intern("NOKOGIRI_USE_PACKAGED_LIBRARIES"), Qfalse); rb_const_set(mNokogiri, rb_intern("NOKOGIRI_LIBXML2_PATH"), Qnil); rb_const_set(mNokogiri, rb_intern("NOKOGIRI_LIBXSLT_PATH"), Qnil); #endif #ifdef LIBXML_ICONV_ENABLED rb_const_set(mNokogiri, rb_intern("LIBXML_ICONV_ENABLED"), Qtrue); #else rb_const_set(mNokogiri, rb_intern("LIBXML_ICONV_ENABLED"), Qfalse); #endif xmlInitParser(); init_xml_document(); init_html_document(); init_xml_node(); init_xml_document_fragment(); init_xml_text(); init_xml_cdata(); init_xml_processing_instruction(); init_xml_attr(); init_xml_entity_reference(); init_xml_comment(); init_xml_node_set(); init_xml_xpath_context(); init_xml_sax_parser_context(); init_xml_sax_parser(); init_xml_sax_push_parser(); init_xml_reader(); init_xml_dtd(); init_xml_element_content(); init_xml_attribute_decl(); init_xml_element_decl(); init_xml_entity_decl(); init_xml_namespace(); init_html_sax_parser_context(); init_html_sax_push_parser(); init_xslt_stylesheet(); init_xml_syntax_error(); init_html_entity_lookup(); init_html_element_description(); init_xml_schema(); init_xml_relax_ng(); init_nokogiri_io(); init_xml_encoding_handler(); } nokogiri-1.6.1/ext/nokogiri/xml_attr.c0000644000175000017500000000377412261213762017352 0ustar boutilboutil#include /* * call-seq: * value=(content) * * Set the value for this Attr to +content+ */ static VALUE set_value(VALUE self, VALUE content) { xmlAttrPtr attr; Data_Get_Struct(self, xmlAttr, attr); if(attr->children) xmlFreeNodeList(attr->children); attr->children = attr->last = NULL; if(content) { xmlChar *buffer; xmlNode *tmp; /* Encode our content */ buffer = xmlEncodeEntitiesReentrant(attr->doc, (unsigned char *)StringValuePtr(content)); attr->children = xmlStringGetNodeList(attr->doc, buffer); attr->last = NULL; tmp = attr->children; /* Loop through the children */ for(tmp = attr->children; tmp; tmp = tmp->next) { tmp->parent = (xmlNode *)attr; tmp->doc = attr->doc; if(tmp->next == NULL) attr->last = tmp; } /* Free up memory */ xmlFree(buffer); } return content; } /* * call-seq: * new(document, name) * * Create a new Attr element on the +document+ with +name+ */ static VALUE new(int argc, VALUE *argv, VALUE klass) { xmlDocPtr xml_doc; VALUE document; VALUE name; VALUE rest; xmlAttrPtr node; VALUE rb_node; rb_scan_args(argc, argv, "2*", &document, &name, &rest); Data_Get_Struct(document, xmlDoc, xml_doc); node = xmlNewDocProp( xml_doc, (const xmlChar *)StringValuePtr(name), NULL ); nokogiri_root_node((xmlNodePtr)node); rb_node = Nokogiri_wrap_xml_node(klass, (xmlNodePtr)node); rb_obj_call_init(rb_node, argc, argv); if(rb_block_given_p()) rb_yield(rb_node); return rb_node; } VALUE cNokogiriXmlAttr; void init_xml_attr() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); /* * Attr represents a Attr node in an xml document. */ VALUE klass = rb_define_class_under(xml, "Attr", node); cNokogiriXmlAttr = klass; rb_define_singleton_method(klass, "new", new, -1); rb_define_method(klass, "value=", set_value, 1); } nokogiri-1.6.1/ext/nokogiri/xml_sax_push_parser.c0000644000175000017500000000527712261213762021606 0ustar boutilboutil#include static void deallocate(xmlParserCtxtPtr ctx) { NOKOGIRI_DEBUG_START(ctx); if(ctx != NULL) { NOKOGIRI_SAX_TUPLE_DESTROY(ctx->userData); xmlFreeParserCtxt(ctx); } NOKOGIRI_DEBUG_END(ctx); } static VALUE allocate(VALUE klass) { return Data_Wrap_Struct(klass, NULL, deallocate, NULL); } /* * call-seq: * native_write(chunk, last_chunk) * * Write +chunk+ to PushParser. +last_chunk+ triggers the end_document handle */ static VALUE native_write(VALUE self, VALUE _chunk, VALUE _last_chunk) { xmlParserCtxtPtr ctx; const char * chunk = NULL; int size = 0; Data_Get_Struct(self, xmlParserCtxt, ctx); if(Qnil != _chunk) { chunk = StringValuePtr(_chunk); size = (int)RSTRING_LEN(_chunk); } if(xmlParseChunk(ctx, chunk, size, Qtrue == _last_chunk ? 1 : 0)) { if (!(ctx->options & XML_PARSE_RECOVER)) { xmlErrorPtr e = xmlCtxtGetLastError(ctx); Nokogiri_error_raise(NULL, e); } } return self; } /* * call-seq: * initialize_native(xml_sax, filename) * * Initialize the push parser with +xml_sax+ using +filename+ */ static VALUE initialize_native(VALUE self, VALUE _xml_sax, VALUE _filename) { xmlSAXHandlerPtr sax; const char * filename = NULL; xmlParserCtxtPtr ctx; Data_Get_Struct(_xml_sax, xmlSAXHandler, sax); if(_filename != Qnil) filename = StringValuePtr(_filename); ctx = xmlCreatePushParserCtxt( sax, NULL, NULL, 0, filename ); if(ctx == NULL) rb_raise(rb_eRuntimeError, "Could not create a parser context"); ctx->userData = NOKOGIRI_SAX_TUPLE_NEW(ctx, self); ctx->sax2 = 1; DATA_PTR(self) = ctx; return self; } static VALUE get_options(VALUE self) { xmlParserCtxtPtr ctx; Data_Get_Struct(self, xmlParserCtxt, ctx); return INT2NUM(ctx->options); } static VALUE set_options(VALUE self, VALUE options) { xmlParserCtxtPtr ctx; Data_Get_Struct(self, xmlParserCtxt, ctx); if (xmlCtxtUseOptions(ctx, (int)NUM2INT(options)) != 0) rb_raise(rb_eRuntimeError, "Cannot set XML parser context options"); return Qnil; } VALUE cNokogiriXmlSaxPushParser ; void init_xml_sax_push_parser() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE sax = rb_define_module_under(xml, "SAX"); VALUE klass = rb_define_class_under(sax, "PushParser", rb_cObject); cNokogiriXmlSaxPushParser = klass; rb_define_alloc_func(klass, allocate); rb_define_private_method(klass, "initialize_native", initialize_native, 2); rb_define_private_method(klass, "native_write", native_write, 2); rb_define_method(klass, "options", get_options, 0); rb_define_method(klass, "options=", set_options, 1); } nokogiri-1.6.1/ext/nokogiri/xml_reader.h0000644000175000017500000000022212261213762017630 0ustar boutilboutil#ifndef NOKOGIRI_XML_READER #define NOKOGIRI_XML_READER #include void init_xml_reader(); extern VALUE cNokogiriXmlReader; #endif nokogiri-1.6.1/ext/nokogiri/html_sax_parser_context.h0000644000175000017500000000030512261213762022447 0ustar boutilboutil#ifndef NOKOGIRI_HTML_SAX_PARSER_CONTEXT #define NOKOGIRI_HTML_SAX_PARSER_CONTEXT #include extern VALUE cNokogiriHtmlSaxParserContext; void init_html_sax_parser_context(); #endif nokogiri-1.6.1/ext/nokogiri/xml_element_decl.c0000644000175000017500000000253512261213762021012 0ustar boutilboutil#include static ID id_document; /* * call-seq: * element_type * * The element_type */ static VALUE element_type(VALUE self) { xmlElementPtr node; Data_Get_Struct(self, xmlElement, node); return INT2NUM((long)node->etype); } /* * call-seq: * content * * The allowed content for this ElementDecl */ static VALUE content(VALUE self) { xmlElementPtr node; Data_Get_Struct(self, xmlElement, node); if(!node->content) return Qnil; return Nokogiri_wrap_element_content( rb_funcall(self, id_document, 0), node->content ); } /* * call-seq: * prefix * * The namespace prefix for this ElementDecl */ static VALUE prefix(VALUE self) { xmlElementPtr node; Data_Get_Struct(self, xmlElement, node); if(!node->prefix) return Qnil; return NOKOGIRI_STR_NEW2(node->prefix); } VALUE cNokogiriXmlElementDecl; void init_xml_element_decl() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE xml = rb_define_module_under(nokogiri, "XML"); VALUE node = rb_define_class_under(xml, "Node", rb_cObject); VALUE klass = rb_define_class_under(xml, "ElementDecl", node); cNokogiriXmlElementDecl = klass; rb_define_method(klass, "element_type", element_type, 0); rb_define_method(klass, "content", content, 0); rb_define_method(klass, "prefix", prefix, 0); id_document = rb_intern("document"); } nokogiri-1.6.1/ext/nokogiri/xml_xpath_context.c0000644000175000017500000002044012261213762021255 0ustar boutilboutil#include int vasprintf (char **strp, const char *fmt, va_list ap); static void deallocate(xmlXPathContextPtr ctx) { NOKOGIRI_DEBUG_START(ctx); xmlXPathFreeContext(ctx); NOKOGIRI_DEBUG_END(ctx); } /* * call-seq: * register_ns(prefix, uri) * * Register the namespace with +prefix+ and +uri+. */ static VALUE register_ns(VALUE self, VALUE prefix, VALUE uri) { xmlXPathContextPtr ctx; Data_Get_Struct(self, xmlXPathContext, ctx); xmlXPathRegisterNs( ctx, (const xmlChar *)StringValuePtr(prefix), (const xmlChar *)StringValuePtr(uri) ); return self; } /* * call-seq: * register_variable(name, value) * * Register the variable +name+ with +value+. */ static VALUE register_variable(VALUE self, VALUE name, VALUE value) { xmlXPathContextPtr ctx; xmlXPathObjectPtr xmlValue; Data_Get_Struct(self, xmlXPathContext, ctx); xmlValue = xmlXPathNewCString(StringValuePtr(value)); xmlXPathRegisterVariable( ctx, (const xmlChar *)StringValuePtr(name), xmlValue ); return self; } void Nokogiri_marshal_xpath_funcall_and_return_values(xmlXPathParserContextPtr ctx, int nargs, VALUE handler, const char* function_name) { int i; VALUE result, doc; VALUE *argv; VALUE node_set = Qnil; xmlNodeSetPtr xml_node_set = NULL; xmlXPathObjectPtr obj; nokogiriNodeSetTuple *node_set_tuple; assert(ctx->context->doc); assert(DOC_RUBY_OBJECT_TEST(ctx->context->doc)); argv = (VALUE *)calloc((size_t)nargs, sizeof(VALUE)); for (i = 0 ; i < nargs ; ++i) { rb_gc_register_address(&argv[i]); } doc = DOC_RUBY_OBJECT(ctx->context->doc); if (nargs > 0) { i = nargs - 1; do { obj = valuePop(ctx); switch(obj->type) { case XPATH_STRING: argv[i] = NOKOGIRI_STR_NEW2(obj->stringval); break; case XPATH_BOOLEAN: argv[i] = obj->boolval == 1 ? Qtrue : Qfalse; break; case XPATH_NUMBER: argv[i] = rb_float_new(obj->floatval); break; case XPATH_NODESET: argv[i] = Nokogiri_wrap_xml_node_set(obj->nodesetval, doc); break; default: argv[i] = NOKOGIRI_STR_NEW2(xmlXPathCastToString(obj)); } xmlXPathFreeNodeSetList(obj); } while(i-- > 0); } result = rb_funcall2(handler, rb_intern((const char*)function_name), nargs, argv); for (i = 0 ; i < nargs ; ++i) { rb_gc_unregister_address(&argv[i]); } free(argv); switch(TYPE(result)) { case T_FLOAT: case T_BIGNUM: case T_FIXNUM: xmlXPathReturnNumber(ctx, NUM2DBL(result)); break; case T_STRING: xmlXPathReturnString( ctx, xmlCharStrdup(StringValuePtr(result)) ); break; case T_TRUE: xmlXPathReturnTrue(ctx); break; case T_FALSE: xmlXPathReturnFalse(ctx); break; case T_NIL: break; case T_ARRAY: { VALUE args[2]; args[0] = doc; args[1] = result; node_set = rb_class_new_instance(2, args, cNokogiriXmlNodeSet); Data_Get_Struct(node_set, nokogiriNodeSetTuple, node_set_tuple); xml_node_set = node_set_tuple->node_set; xmlXPathReturnNodeSet(ctx, xmlXPathNodeSetMerge(NULL, xml_node_set)); } break; case T_DATA: if(rb_obj_is_kind_of(result, cNokogiriXmlNodeSet)) { Data_Get_Struct(result, nokogiriNodeSetTuple, node_set_tuple); xml_node_set = node_set_tuple->node_set; /* Copy the node set, otherwise it will get GC'd. */ xmlXPathReturnNodeSet(ctx, xmlXPathNodeSetMerge(NULL, xml_node_set)); break; } default: rb_raise(rb_eRuntimeError, "Invalid return type"); } } static void ruby_funcall(xmlXPathParserContextPtr ctx, int nargs) { VALUE handler = Qnil; const char *function = NULL ; assert(ctx); assert(ctx->context); assert(ctx->context->userData); assert(ctx->context->function); handler = (VALUE)(ctx->context->userData); function = (const char*)(ctx->context->function); Nokogiri_marshal_xpath_funcall_and_return_values(ctx, nargs, handler, function); } static xmlXPathFunction lookup( void *ctx, const xmlChar * name, const xmlChar* ns_uri ) { VALUE xpath_handler = (VALUE)ctx; if(rb_respond_to(xpath_handler, rb_intern((const char *)name))) return ruby_funcall; return NULL; } NORETURN(static void xpath_exception_handler(void * ctx, xmlErrorPtr error)); static void xpath_exception_handler(void * ctx, xmlErrorPtr error) { VALUE xpath = rb_const_get(mNokogiriXml, rb_intern("XPath")); VALUE klass = rb_const_get(xpath, rb_intern("SyntaxError")); rb_exc_raise(Nokogiri_wrap_xml_syntax_error(klass, error)); } NORETURN(static void xpath_generic_exception_handler(void * ctx, const char *msg, ...)); static void xpath_generic_exception_handler(void * ctx, const char *msg, ...) { char * message; va_list args; va_start(args, msg); vasprintf(&message, msg, args); va_end(args); rb_raise(rb_eRuntimeError, "%s", message); } /* * call-seq: * evaluate(search_path, handler = nil) * * Evaluate the +search_path+ returning an XML::XPath object. */ static VALUE evaluate(int argc, VALUE *argv, VALUE self) { VALUE search_path, xpath_handler; VALUE thing = Qnil; xmlXPathContextPtr ctx; xmlXPathObjectPtr xpath; xmlChar *query; Data_Get_Struct(self, xmlXPathContext, ctx); if(rb_scan_args(argc, argv, "11", &search_path, &xpath_handler) == 1) xpath_handler = Qnil; query = (xmlChar *)StringValuePtr(search_path); if(Qnil != xpath_handler) { /* FIXME: not sure if this is the correct place to shove private data. */ ctx->userData = (void *)xpath_handler; xmlXPathRegisterFuncLookup(ctx, lookup, (void *)xpath_handler); } xmlResetLastError(); xmlSetStructuredErrorFunc(NULL, xpath_exception_handler); /* For some reason, xmlXPathEvalExpression will blow up with a generic error */ /* when there is a non existent function. */ xmlSetGenericErrorFunc(NULL, xpath_generic_exception_handler); xpath = xmlXPathEvalExpression(query, ctx); xmlSetStructuredErrorFunc(NULL, NULL); xmlSetGenericErrorFunc(NULL, NULL); if(xpath == NULL) { VALUE xpath = rb_const_get(mNokogiriXml, rb_intern("XPath")); VALUE klass = rb_const_get(xpath, rb_intern("SyntaxError")); xmlErrorPtr error = xmlGetLastError(); rb_exc_raise(Nokogiri_wrap_xml_syntax_error(klass, error)); } assert(ctx->doc); assert(DOC_RUBY_OBJECT_TEST(ctx->doc)); switch(xpath->type) { case XPATH_STRING: thing = NOKOGIRI_STR_NEW2(xpath->stringval); xmlFree(xpath->stringval); break; case XPATH_NODESET: if(NULL == xpath->nodesetval) { thing = Nokogiri_wrap_xml_node_set(xmlXPathNodeSetCreate(NULL), DOC_RUBY_OBJECT(ctx->doc)); } else { thing = Nokogiri_wrap_xml_node_set(xpath->nodesetval, DOC_RUBY_OBJECT(ctx->doc)); } break; case XPATH_NUMBER: thing = rb_float_new(xpath->floatval); break; case XPATH_BOOLEAN: thing = xpath->boolval == 1 ? Qtrue : Qfalse; break; default: thing = Nokogiri_wrap_xml_node_set(xmlXPathNodeSetCreate(NULL), DOC_RUBY_OBJECT(ctx->doc)); } xmlXPathFreeNodeSetList(xpath); return thing; } /* * call-seq: * new(node) * * Create a new XPathContext with +node+ as the reference point. */ static VALUE new(VALUE klass, VALUE nodeobj) { xmlNodePtr node; xmlXPathContextPtr ctx; VALUE self; xmlXPathInit(); Data_Get_Struct(nodeobj, xmlNode, node); ctx = xmlXPathNewContext(node->doc); ctx->node = node; self = Data_Wrap_Struct(klass, 0, deallocate, ctx); /*rb_iv_set(self, "@xpath_handler", Qnil); */ return self; } VALUE cNokogiriXmlXpathContext; void init_xml_xpath_context(void) { VALUE module = rb_define_module("Nokogiri"); /* * Nokogiri::XML */ VALUE xml = rb_define_module_under(module, "XML"); /* * XPathContext is the entry point for searching a Document by using XPath. */ VALUE klass = rb_define_class_under(xml, "XPathContext", rb_cObject); cNokogiriXmlXpathContext = klass; rb_define_singleton_method(klass, "new", new, 1); rb_define_method(klass, "evaluate", evaluate, -1); rb_define_method(klass, "register_variable", register_variable, 2); rb_define_method(klass, "register_ns", register_ns, 2); } nokogiri-1.6.1/ext/nokogiri/html_sax_push_parser.c0000644000175000017500000000417512261213762021746 0ustar boutilboutil#include /* * call-seq: * native_write(chunk, last_chunk) * * Write +chunk+ to PushParser. +last_chunk+ triggers the end_document handle */ static VALUE native_write(VALUE self, VALUE _chunk, VALUE _last_chunk) { xmlParserCtxtPtr ctx; const char * chunk = NULL; int size = 0; Data_Get_Struct(self, xmlParserCtxt, ctx); if(Qnil != _chunk) { chunk = StringValuePtr(_chunk); size = (int)RSTRING_LEN(_chunk); } if(htmlParseChunk(ctx, chunk, size, Qtrue == _last_chunk ? 1 : 0)) { if (!(ctx->options & XML_PARSE_RECOVER)) { xmlErrorPtr e = xmlCtxtGetLastError(ctx); Nokogiri_error_raise(NULL, e); } } return self; } /* * call-seq: * initialize_native(xml_sax, filename) * * Initialize the push parser with +xml_sax+ using +filename+ */ static VALUE initialize_native(VALUE self, VALUE _xml_sax, VALUE _filename, VALUE encoding) { htmlSAXHandlerPtr sax; const char * filename = NULL; htmlParserCtxtPtr ctx; xmlCharEncoding enc = XML_CHAR_ENCODING_NONE; Data_Get_Struct(_xml_sax, xmlSAXHandler, sax); if(_filename != Qnil) filename = StringValuePtr(_filename); if (!NIL_P(encoding)) { enc = xmlParseCharEncoding(StringValuePtr(encoding)); if (enc == XML_CHAR_ENCODING_ERROR) rb_raise(rb_eArgError, "Unsupported Encoding"); } ctx = htmlCreatePushParserCtxt( sax, NULL, NULL, 0, filename, enc ); if(ctx == NULL) rb_raise(rb_eRuntimeError, "Could not create a parser context"); ctx->userData = NOKOGIRI_SAX_TUPLE_NEW(ctx, self); ctx->sax2 = 1; DATA_PTR(self) = ctx; return self; } VALUE cNokogiriHtmlSaxPushParser; void init_html_sax_push_parser() { VALUE nokogiri = rb_define_module("Nokogiri"); VALUE html = rb_define_module_under(nokogiri, "HTML"); VALUE sax = rb_define_module_under(html, "SAX"); VALUE klass = rb_define_class_under(sax, "PushParser", cNokogiriXmlSaxPushParser); cNokogiriHtmlSaxPushParser = klass; rb_define_private_method(klass, "initialize_native", initialize_native, 3); rb_define_private_method(klass, "native_write", native_write, 2); } nokogiri-1.6.1/README.ja.rdoc0000644000175000017500000000642612261213762015127 0ustar boutilboutil= Nokogiri (鋸) {}[http://travis-ci.org/sparklemotion/nokogiri] {}[https://codeclimate.com/github/sparklemotion/nokogiri] * http://nokogiri.org/ * http://github.com/sparklemotion/nokogiri/wikis * http://github.com/sparklemotion/nokogiri/tree/master * http://groups.google.com/group/nokogiri-list * http://github.com/sparklemotion/nokogiri/issues == DESCRIPTION: Nokogiri はHTMLとXMLとSAXとXSLTとReaderのパーサーです。とりわけ重要な特徴は、 ドキュメントをXPathやCSS3セレクター経由で探索する機能を持つことです。 XMLは暴力に似ている - XMLが君の問題を解決しないとしたら、君はXMLを十分に 使いこなしていない事になる。 == FEATURES: * XPath による探索 * CSS3 のセレクターによる探索 * XML/HTMLのビルダー XML/HTMLの高速な解析と探索検索、ならびにCSS3セレクタとXPathをサポートしています。 == SUPPORT: 日本語でNokogiriの {メーリングリスト}[http://groups.google.com/group/nokogiri-list] * http://groups.google.com/group/nokogiri-list {バグ報告}[http://github.com/sparklemotion/nokogiri/issues] * http://github.com/sparklemotion/nokogiri/issues IRCのチャンネルはfreenodeの #nokogiri です。 == SYNOPSIS: require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove')) #### # Search for nodes by css doc.css('h3.r a.l').each do |link| puts link.content end #### # Search for nodes by xpath doc.xpath('//h3/a[@class="l"]').each do |link| puts link.content end #### # Or mix and match. doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link| puts link.content end == REQUIREMENTS: * ruby 1.8 or 1.9 * libxml2 * libxml2-dev * libxslt * libxslt-dev == INSTALL: * sudo gem install nokogiri == LICENSE: (The MIT License) Copyright (c) 2008 - 2010: * {Aaron Patterson}[http://tenderlovemaking.com] * {Mike Dalessio}[http://mike.daless.io] * {Charles Nutter}[http://blog.headius.com] * {Sergio Arbeo}[http://www.serabe.com] * {Patrick Mahoney}[http://polycrystal.org] * {Yoko Harada}[http://yokolet.blogspot.com] Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. nokogiri-1.6.1/Rakefile0000644000175000017500000002040512261213762014366 0ustar boutilboutil# -*- ruby -*- require 'rubygems' gem 'hoe' require 'hoe' Hoe.plugin :debugging Hoe.plugin :git Hoe.plugin :gemspec Hoe.plugin :bundler Hoe.add_include_dirs '.' GENERATED_PARSER = "lib/nokogiri/css/parser.rb" GENERATED_TOKENIZER = "lib/nokogiri/css/tokenizer.rb" CROSS_DIR = File.join(File.dirname(__FILE__), 'ports') def java? !! (RUBY_PLATFORM =~ /java/) end ENV['LANG'] = "en_US.UTF-8" # UBUNTU 10.04, Y U NO DEFAULT TO UTF-8? require 'tasks/nokogiri.org' HOE = Hoe.spec 'nokogiri' do developer 'Aaron Patterson', 'aaronp@rubyforge.org' developer 'Mike Dalessio', 'mike.dalessio@gmail.com' developer 'Yoko Harada', 'yokolet@gmail.com' developer 'Tim Elliott', 'tle@holymonkey.com' self.readme_file = ['README', ENV['HLANG'], 'rdoc'].compact.join('.') self.history_file = ['CHANGELOG', ENV['HLANG'], 'rdoc'].compact.join('.') self.extra_rdoc_files = FileList['*.rdoc','ext/nokogiri/*.c'] self.licenses = ['MIT'] self.clean_globs += [ 'nokogiri.gemspec', 'lib/nokogiri/nokogiri.{bundle,jar,rb,so}', 'lib/nokogiri/{1.9,2.0}', # GENERATED_PARSER, # GENERATED_TOKENIZER ] self.extra_deps += [ ["mini_portile", "~> 0.5.0"], ] self.extra_dev_deps += [ ["hoe-bundler", ">= 1.1"], ["hoe-debugging", ">= 1.0.3"], ["hoe-gemspec", ">= 1.0"], ["hoe-git", ">= 1.4"], ["minitest", "~> 2.2.2"], ["rake", ">= 0.9"], ["rake-compiler", "~> 0.8.0"], ["racc", ">= 1.4.6"], ["rexical", ">= 1.0.5"] ] if java? self.spec_extras = { :platform => 'java' } else self.spec_extras = { :extensions => ["ext/nokogiri/extconf.rb"], :required_ruby_version => '>= 1.9.2' } end self.testlib = :minitest end # ---------------------------------------- def add_file_to_gem relative_path target_path = File.join gem_build_path, relative_path target_dir = File.dirname(target_path) mkdir_p target_dir unless File.directory?(target_dir) rm_f target_path ln relative_path, target_path HOE.spec.files += [relative_path] end def gem_build_path File.join 'pkg', HOE.spec.full_name end if java? # TODO: clean this section up. require "rake/javaextensiontask" Rake::JavaExtensionTask.new("nokogiri", HOE.spec) do |ext| jruby_home = RbConfig::CONFIG['prefix'] ext.ext_dir = 'ext/java' ext.lib_dir = 'lib/nokogiri' jars = ["#{jruby_home}/lib/jruby.jar"] + FileList['lib/*.jar'] ext.classpath = jars.map { |x| File.expand_path x }.join ':' end task gem_build_path => [:compile] do add_file_to_gem 'lib/nokogiri/nokogiri.jar' end else mingw_available = true begin require 'tasks/cross_compile' rescue puts "WARNING: cross compilation not available: #{$!}" mingw_available = false end require "rake/extensiontask" HOE.spec.files.reject! { |f| f =~ %r{\.(java|jar)$} } windows_p = RbConfig::CONFIG['target_os'] == 'mingw32' || RbConfig::CONFIG['target_os'] =~ /mswin/ unless windows_p || java? task gem_build_path do add_file_to_gem "dependencies.yml" dependencies = YAML.load_file("dependencies.yml") %w[libxml2 libxslt].each do |lib| version = dependencies[lib] archive = File.join("ports", "archives", "#{lib}-#{version}.tar.gz") add_file_to_gem archive end end end Rake::ExtensionTask.new("nokogiri", HOE.spec) do |ext| ext.lib_dir = File.join(*['lib', 'nokogiri', ENV['FAT_DIR']].compact) ext.config_options << ENV['EXTOPTS'] if mingw_available ext.cross_compile = true ext.cross_platform = ["x86-mswin32-60", "x86-mingw32"] ext.cross_config_options << "--with-xml2-include=#{File.join($recipes["libxml2"].path, 'include', 'libxml2')}" ext.cross_config_options << "--with-xml2-lib=#{File.join($recipes["libxml2"].path, 'lib')}" ext.cross_config_options << "--with-iconv-dir=#{$recipes["libiconv"].path}" ext.cross_config_options << "--with-xslt-dir=#{$recipes["libxslt"].path}" ext.cross_config_options << "--with-zlib-dir=#{CROSS_DIR}" end end end # ---------------------------------------- desc "Generate css/parser.rb and css/tokenizer.rex" task 'generate' => [GENERATED_PARSER, GENERATED_TOKENIZER] task 'gem:spec' => 'generate' if Rake::Task.task_defined?("gem:spec") # This is a big hack to make sure that the racc and rexical # dependencies in the Gemfile are constrainted to ruby platforms # (i.e. MRI and Rubinius). There's no way to do that through hoe, # and any solution will require changing hoe and hoe-bundler. old_gemfile_task = Rake::Task['bundler:gemfile'] rescue nil task 'bundler:gemfile' do old_gemfile_task.invoke if old_gemfile_task lines = File.open('Gemfile', 'r') { |f| f.readlines }.map do |line| line =~ /racc|rexical/ ? "#{line.strip}, :platform => :ruby" : line end File.open('Gemfile', 'w') { |f| lines.each { |line| f.puts line } } end file GENERATED_PARSER => "lib/nokogiri/css/parser.y" do |t| racc = RbConfig::CONFIG['target_os'] =~ /mswin32/ ? '' : `which racc`.strip racc = "#{::RbConfig::CONFIG['bindir']}/racc" if racc.empty? racc = %x{command -v racc}.strip if racc.empty? sh "#{racc} -l -o #{t.name} #{t.prerequisites.first}" end file GENERATED_TOKENIZER => "lib/nokogiri/css/tokenizer.rex" do |t| sh "rex --independent -o #{t.name} #{t.prerequisites.first}" end [:compile, :check_manifest].each do |task_name| Rake::Task[task_name].prerequisites << GENERATED_PARSER Rake::Task[task_name].prerequisites << GENERATED_TOKENIZER end # ---------------------------------------- desc "set environment variables to build and/or test with debug options" task :debug do ENV['NOKOGIRI_DEBUG'] = "true" ENV['CFLAGS'] ||= "" ENV['CFLAGS'] += " -DDEBUG" end require 'tasks/test' task :java_debug do ENV['JAVA_OPTS'] = '-Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y' if java? && ENV['JAVA_DEBUG'] end if java? task :test_18 => :test task :test_19 do ENV['JRUBY_OPTS'] = "--1.9" Rake::Task["test"].invoke end end Rake::Task[:test].prerequisites << :compile Rake::Task[:test].prerequisites << :java_debug Rake::Task[:test].prerequisites << :check_extra_deps unless java? if Hoe.plugins.include?(:debugging) ['valgrind', 'valgrind:mem', 'valgrind:mem0'].each do |task_name| Rake::Task["test:#{task_name}"].prerequisites << :compile end end # ---------------------------------------- desc "build a windows gem without all the ceremony." task "gem:windows" => "gem" do cross_rubies = ["1.9.3-p194", "2.0.0-p0"] ruby_cc_version = cross_rubies.collect { |_| _.split("-").first }.join(":") # e.g., "1.8.7:1.9.2" rake_compiler_config_path = "#{ENV['HOME']}/.rake-compiler/config.yml" unless File.exists? rake_compiler_config_path raise "rake-compiler has not installed any cross rubies. try running 'env --unset=HOST rake-compiler cross-ruby VERSION=#{cross_rubies.first}'" end rake_compiler_config = YAML.load_file(rake_compiler_config_path) # check that rake-compiler config contains the right patchlevels. see #279 for background, # and http://blog.mmediasys.com/2011/01/22/rake-compiler-updated-list-of-supported-ruby-versions-for-cross-compilation/ # for more up-to-date docs. cross_rubies.each do |version| majmin, patchlevel = version.split("-") rbconfig = "rbconfig-#{majmin}" unless rake_compiler_config.key?(rbconfig) && rake_compiler_config[rbconfig] =~ /-#{patchlevel}/ raise "rake-compiler '#{rbconfig}' not #{patchlevel}. try running 'env --unset=HOST rake-compiler cross-ruby VERSION=#{version}'" end end # verify that --export-all is in the 1.9 rbconfig. see #279,#374,#375. rbconfig_19 = rake_compiler_config["rbconfig-1.9.3"] raise "rbconfig #{rbconfig_19} needs --export-all in its DLDFLAGS value" if File.read(rbconfig_19).split("\n").grep(/CONFIG\["DLDFLAGS"\].*--export-all/).empty? rbconfig_20 = rake_compiler_config["rbconfig-2.0.0"] raise "rbconfig #{rbconfig_20} needs --export-all in its DLDFLAGS value" if File.read(rbconfig_20).split("\n").grep(/CONFIG\["DLDFLAGS"\].*--export-all/).empty? pkg_config_path = %w[libxslt libxml2].collect { |pkg| File.join($recipes[pkg].path, "lib/pkgconfig") }.join(":") sh("env PKG_CONFIG_PATH=#{pkg_config_path} RUBY_CC_VERSION=#{ruby_cc_version} rake cross native gem") || raise("build failed!") end # vim: syntax=Ruby nokogiri-1.6.1/bin/0000755000175000017500000000000012261213762013470 5ustar boutilboutilnokogiri-1.6.1/bin/nokogiri0000755000175000017500000000330112261213762015234 0ustar boutilboutil#!/usr/bin/env ruby require 'optparse' require 'open-uri' require 'irb' require 'uri' require 'rubygems' require 'nokogiri' parse_class = Nokogiri encoding = nil opts = OptionParser.new do |opts| opts.banner = "Nokogiri: an HTML, XML, SAX, and Reader parser" opts.define_head "Usage: nokogiri [options]" opts.separator "" opts.separator "Examples:" opts.separator " nokogiri http://www.ruby-lang.org/" opts.separator " nokogiri ./public/index.html" opts.separator " curl -s http://nokogiri.org | nokogiri -e'p $_.css(\"h1\").length'" opts.separator "" opts.separator "Options:" opts.on("--type [TYPE]", [:xml, :html]) do |v| parse_class = {:xml => Nokogiri::XML, :html => Nokogiri::HTML}[v] end opts.on("-E", "--encoding encoding", "Read as encoding (default #{encoding})") do |v| encoding = v end opts.on("-e command", "Specifies script from command-line.") do |v| @script = v end opts.on("--rng ", "Validate using this rng file.") do |v| @rng = open(v) {|f| Nokogiri::XML::RelaxNG(f)} end opts.on_tail("-?", "--help", "Show this message") do puts opts exit end opts.on_tail("-v", "--version", "Show version") do puts Nokogiri::VersionInfo.instance.to_markdown exit end end opts.parse! uri = ARGV.shift if uri.to_s.strip.empty? && $stdin.tty? puts opts exit 1 end if $stdin.tty? @doc = parse_class.parse(open(uri).read, nil, encoding) else @doc = parse_class.parse($stdin, nil, encoding) end $_ = @doc if @rng @rng.validate(@doc).each do |error| puts error.message end else if @script eval @script, binding, '
' else puts "Your document is stored in @doc..." IRB.start end end nokogiri-1.6.1/checksums.yaml.gz0000444000175000017500000000041712261213762016210 0ustar boutilboutil~Re+VDA m`8I:g0Q,67uxHr=Zδ PWiu C -,Y:xt.Mbsȶ;P^D1Rf!BJ-sR0J,><,dz1*t?(t)b4ZZ[ׂ)n ;v DZWXh~=qs=1.1", :group => [:development, :test] gem "hoe-debugging", ">=1.0.3", :group => [:development, :test] gem "hoe-gemspec", ">=1.0", :group => [:development, :test] gem "hoe-git", ">=1.4", :group => [:development, :test] gem "mini_portile", ">=0.2.2", :group => [:development, :test] gem "minitest", "~>2.2.2", :group => [:development, :test] gem "rake", ">=0.9", :group => [:development, :test] gem "rake-compiler", "~>0.8.0", :group => [:development, :test] gem "racc", ">=1.4.6", :group => [:development, :test], :platform => :ruby gem "rexical", ">=1.0.5", :group => [:development, :test], :platform => :ruby gem "rdoc", "~>3.10", :group => [:development, :test] gem "hoe", "~>3.7", :group => [:development, :test] # vim: syntax=ruby nokogiri-1.6.1/STANDARD_RESPONSES.md0000644000175000017500000000250512261213762016105 0ustar boutilboutil# Standard Responses to Requests These responses are needed often enough that I figured, let's just check them in for future reference and use. # Not enough information to help Hello! Thanks for asking this question! However, without more information, Team Nokogiri cannot reproduce your issue, and so we cannot offer much help. Please provide us with: * A self-contained script (one that we can run without modification, and preferably without making external network connections). * Please note that you need to include the XML/HTML that you are operating on. * The output of `nokogiri -v`, which will provide details about your platform and versions of ruby, libxml2 and nokogiri. For more information about requesting help or reporting bugs, please take a look at http://bit.ly/nokohelp Thank you so much! # Not a bug Hello! Thanks for asking this question! Your request for assistance using Nokogiri will not go unanswered! However, Nokogiri's Github Issues is reserved for reporting bugs or submitting patches. If you ask your question on the mailing list, Team Nokogiri promises someone will provide you with an answer in a timely manner. If you'd like to read up on Team Nokogiri's rationale for this policy, please go to http://bit.ly/nokohelp. Thank you so much for understanding! And thank you for using Nokogiri. nokogiri-1.6.1/Y_U_NO_GEMSPEC.md0000644000175000017500000001333612261213762015503 0ustar boutilboutil(note: this was originally a blog post published at http://blog.flavorjon.es/2012/03/y-u-no-gemspec.html) ## tl;dr 1. Team Nokogiri are not 10-foot-tall code-crunching robots, so `master` is usually unstable. 2. Unstable code can corrupt your data and crash your application, which would make everybody look bad. 3. Therefore, the _risk_ associated with using unstable code is severe; for you _and_ for Team Nokogiri. 4. The absence of a gemspec is a risk mitigation tactic. 5. You can always ask for an RC release. ## Why Isn't There a Gemspec!? OHAI! Thank you for asking this question! Team Nokogiri gets asked this pretty frequently. Just a sample from the historical record: * [Issue #274](https://github.com/sparklemotion/nokogiri/issues/274) * [Issue #371](https://github.com/sparklemotion/nokogiri/issues/371) * [A commit removing nokogiri.gemspec](https://github.com/sparklemotion/nokogiri/commit/7f17a643a05ca381d65131515b54d4a3a61ca2e1#commitcomment-667477) * [A nokogiri-talk thread](http://groups.google.com/group/nokogiri-talk/browse_thread/thread/4706b002e492d23f) * [Another nokogiri-talk thread](http://groups.google.com/group/nokogiri-talk/browse_thread/thread/0b201bb80ea3eea0) Sometimes people imply that we've forgotten, or that we don't how to properly manage our codebase. Those people are super fun to respond to! We've gone back and forth a couple of times over the past few years, but the current policy of Team Nokogiri is to **not** provide a gemspec in the Github repo. This is a conscious choice, not an oversight. ## But You Didn't Answer the Question! Ah, I was hoping you wouldn't notice. Well, OK, let's do this, if you're serious about it. I'd like to start by talking about _risk_. Specifically, the risk associated with using a known-unstable version of Nokogiri. ### Risk One common way to evaluate the _risk_ of an incident is: risk = probability x impact You can read more about this on [the internets](http://en.wikipedia.org/wiki/Risk_Matrix). The _risk_ associated with a Nokogiri bug could be loosely defined by answering the questions: * "How likely is it that a bug exists?" (probability) * "How severe will the consequences of a bug be?" (impact) ### Probability The `master` branch should be considered unstable. Team Nokogiri are not 10-foot-tall code-crunching robots; we are humans. We make mistakes, and as a result, any arbitrary commit on `master` is likely to contain bugs. Just as an example, Nokogiri `master` was unstable for about five months between November 2011 and March 2012. It was unstable not because we were sloppy, or didn't care, but because the fixes were hard and unobvious. When we release Nokogiri, we test for memory leaks and invalid memory access on all kinds of platforms with many flavors of Ruby and lots of versions of libxml2. Because these tests are time-consuming, we don't run them on every commit. We run them often when preparing a release. If we're releasing Nokogiri, it means we think it's rock solid. And if we're not releasing it, it means there are probably bugs. ### Impact Nokogiri is a gem with native extensions. This means it's not pure Ruby -- there's C or Java code being compiled and run, which means that there's always a chance that the gem will crash your application, or worse. Possible outcomes include: * leaking memory * corrupting data * making benign code crash (due to memory corruption) So, then, a bug in a native extension can have much worse downside than you might think. It's not just going to do something unexpected; it's possibly going to do terrible, awful things to your application and data. **Nobody** wants that to happen. Especially Team Nokogiri. ### Risk, Redux So, if you accept the equation risk = probability x impact and you believe me when I say that: * the probablility of a bug in unreleased code is high, and * the impact of a bug is likely to be severe, then you should easily see that the _risk_ associated with a bug in Nokogiri is quite high. Part of Team Nokogiri's job is to try to mitigate this risk. We have a number of tactics that we use to accomplish this: * we respond quickly to bug reports, particularly when they are possible memory issues * we review each others' commits * we have a thorough test suite, and we test-drive new features * we discuss code design and issues on a core developer mailing list * we use valgrind to test for memory issues (leaks and invalid access) on multiple combinations of OS, libxml2 and Ruby * we package release candidates, and encourage devs to use them * **we do NOT commit a gemspec in our git repository** Yes, that's right, the absence of a gemspec is a risk mitigation tactic. Not only does Team Nokogiri not want to imply support for `master`, we want to **actively discourage** people from using it. Because it's not stable. ## But I Want to Do It Anyway Another option, is to email the [nokogiri-talk list](http://groups.google.com/group/nokogiri-talk) and ask for a release candidate to be built. We're pretty accommodating if there's a bugfix that's a blocker for you. And if we can't release an RC, we'll tell you why. And in the end, nothing is stopping you from cloning the repo and generating a private gemspec. This is an extra step or two, but it has the benefit of making sure developers have thought through the costs and risks involved; and it tends to select for developers who know what they're doing. ## In Conclusion Team Nokogiri takes stability very seriously. We want everybody who uses Nokogiri to have a pleasant experience. And so we want to make sure that you're using the best software we can make. Please keep in mind that we're trying very hard to do the right thing for all Nokogiri users out there in Rubyland. Nokogiri loves you very much, and we hope you love it back. nokogiri-1.6.1/test/0000755000175000017500000000000012261213762013677 5ustar boutilboutilnokogiri-1.6.1/test/test_xslt_transforms.rb0000644000175000017500000001657012261213762020544 0ustar boutilboutilrequire "helper" class TestXsltTransforms < Nokogiri::TestCase def setup @doc = Nokogiri::XML(File.open(XML_FILE)) end def test_class_methods style = Nokogiri::XSLT(File.read(XSLT_FILE)) assert result = style.apply_to(@doc, ['title', '"Grandma"']) assert_match %r{

Grandma

}, result end def test_transform assert style = Nokogiri::XSLT.parse(File.read(XSLT_FILE)) assert result = style.apply_to(@doc, ['title', '"Booyah"']) assert_match %r{

Booyah

}, result assert_match %r{}, result assert_match %r{}, result assert_match %r{}, result assert_match %r{}, result assert_match %r{EMP0003}, result assert_match %r{Margaret Martin}, result assert_match %r{Computer Specialist}, result assert_match %r{100,000}, result assert_no_match %r{Dallas|Texas}, result assert_no_match %r{Female}, result assert result = style.apply_to(@doc, ['title', '"Grandma"']) assert_match %r{

Grandma

}, result assert result = style.apply_to(@doc) assert_match %r{

}, result end def test_transform_with_output_style xslt = "" if Nokogiri.jruby? xslt = Nokogiri::XSLT(<<-eoxslt) eoxslt else xslt = Nokogiri::XSLT(<<-eoxslt) eoxslt end assert_no_match(//, xslt.apply_to(@doc, ['title', 'foo'])) end def test_transform_arg_error assert style = Nokogiri::XSLT(File.read(XSLT_FILE)) assert_raises(TypeError) do style.transform(@doc, :foo) end end def test_transform_with_hash assert style = Nokogiri::XSLT(File.read(XSLT_FILE)) result = style.transform(@doc, {'title' => '"Booyah"'}) assert result.html? assert_equal "Booyah", result.at_css("h1").content end def test_transform2 assert style = Nokogiri::XSLT(File.open(XSLT_FILE)) assert result_doc = style.transform(@doc) assert result_doc.html? assert_equal "", result_doc.at_css("h1").content assert style = Nokogiri::XSLT(File.read(XSLT_FILE)) assert result_doc = style.transform(@doc, ['title', '"Booyah"']) assert result_doc.html? assert_equal "Booyah", result_doc.at_css("h1").content assert result_string = style.apply_to(@doc, ['title', '"Booyah"']) assert_equal result_string, style.serialize(result_doc) end def test_transform_with_quote_params assert style = Nokogiri::XSLT(File.open(XSLT_FILE)) assert result_doc = style.transform(@doc, Nokogiri::XSLT.quote_params(['title', 'Booyah'])) assert result_doc.html? assert_equal "Booyah", result_doc.at_css("h1").content assert style = Nokogiri::XSLT.parse(File.read(XSLT_FILE)) assert result_doc = style.transform(@doc, Nokogiri::XSLT.quote_params({'title' => 'Booyah'})) assert result_doc.html? assert_equal "Booyah", result_doc.at_css("h1").content end def test_quote_params h = { :sym => %{xxx}, 'str' => %{"xxx"}, :sym2 => %{'xxx'}, 'str2' => %{x'x'x}, :sym3 => %{x"x"x}, } hh=h.dup result_hash = Nokogiri::XSLT.quote_params(h) assert_equal hh, h # non-destructive a=h.to_a.flatten result_array = Nokogiri::XSLT.quote_params(a) assert_equal h.to_a.flatten, a #non-destructive assert_equal result_array, result_hash end if Nokogiri.uses_libxml? # By now, cannot get it working on JRuby, see: # http://yokolet.blogspot.com/2010/10/pure-java-nokogiri-xslt-extension.html def test_exslt assert doc = Nokogiri::XML.parse(File.read(EXML_FILE)) assert doc.xml? assert style = Nokogiri::XSLT.parse(File.read(EXSLT_FILE)) params = { :p1 => 'xxx', :p2 => "x'x'x", :p3 => 'x"x"x', :p4 => '"xxx"' } result_doc = Nokogiri::XML.parse(style.apply_to(doc, Nokogiri::XSLT.quote_params(params))) assert_equal 'func-result', result_doc.at('/root/function').content assert_equal 3, result_doc.at('/root/max').content.to_i assert_match( /\d{4}-\d\d-\d\d([-|+]\d\d:\d\d)?/, result_doc.at('/root/date').content ) result_doc.xpath('/root/params/*').each do |p| assert_equal p.content, params[p.name.intern] end check_params result_doc, params result_doc = Nokogiri::XML.parse(style.apply_to(doc, Nokogiri::XSLT.quote_params(params.to_a.flatten))) check_params result_doc, params end def test_xslt_paramaters xslt_str = <<-EOX EOX xslt = Nokogiri::XSLT(xslt_str) doc = Nokogiri::XML("") assert_match %r{bar}, xslt.transform(doc, Nokogiri::XSLT.quote_params('foo' => 'bar')).to_s end def test_xslt_transform_error xslt_str = <<-EOX EOX xslt = Nokogiri::XSLT(xslt_str) doc = Nokogiri::XML("") assert_raises(RuntimeError) { xslt.transform(doc) } end end def test_xslt_parse_error xslt_str = <<-EOX } EOX assert_raises(RuntimeError) { Nokogiri::XSLT.parse(xslt_str) } end def test_passing_a_non_document_to_transform xsl = Nokogiri::XSLT('') assert_raises(ArgumentError) { xsl.transform("
") } assert_raises(ArgumentError) { xsl.transform(Nokogiri::HTML("").css("body")) } end def check_params result_doc, params result_doc.xpath('/root/params/*').each do |p| assert_equal p.content, params[p.name.intern] end end end nokogiri-1.6.1/test/helper.rb0000644000175000017500000001070012261213762015501 0ustar boutilboutil#Process.setrlimit(Process::RLIMIT_CORE, Process::RLIM_INFINITY) unless RUBY_PLATFORM =~ /(java|mswin|mingw)/i $VERBOSE = true require 'minitest/autorun' require 'minitest/pride' require 'fileutils' require 'tempfile' require 'pp' require 'nokogiri' warn "#{__FILE__}:#{__LINE__}: version info: #{Nokogiri::VERSION_INFO.inspect}" module Nokogiri class TestCase < MiniTest::Spec ASSETS_DIR = File.expand_path File.join(File.dirname(__FILE__), 'files') ADDRESS_SCHEMA_FILE = File.join(ASSETS_DIR, 'address_book.rlx') ADDRESS_XML_FILE = File.join(ASSETS_DIR, 'address_book.xml') ENCODING_HTML_FILE = File.join(ASSETS_DIR, 'encoding.html') ENCODING_XHTML_FILE = File.join(ASSETS_DIR, 'encoding.xhtml') EXML_FILE = File.join(ASSETS_DIR, 'exslt.xml') EXSLT_FILE = File.join(ASSETS_DIR, 'exslt.xslt') HTML_FILE = File.join(ASSETS_DIR, 'tlm.html') METACHARSET_FILE = File.join(ASSETS_DIR, 'metacharset.html') NICH_FILE = File.join(ASSETS_DIR, '2ch.html') NOENCODING_FILE = File.join(ASSETS_DIR, 'noencoding.html') PO_SCHEMA_FILE = File.join(ASSETS_DIR, 'po.xsd') PO_XML_FILE = File.join(ASSETS_DIR, 'po.xml') SHIFT_JIS_HTML = File.join(ASSETS_DIR, 'shift_jis.html') SHIFT_JIS_XML = File.join(ASSETS_DIR, 'shift_jis.xml') SNUGGLES_FILE = File.join(ASSETS_DIR, 'snuggles.xml') XML_FILE = File.join(ASSETS_DIR, 'staff.xml') XML_XINCLUDE_FILE = File.join(ASSETS_DIR, 'xinclude.xml') XSLT_FILE = File.join(ASSETS_DIR, 'staff.xslt') def teardown if ENV['NOKOGIRI_GC'] STDOUT.putc '!' if RUBY_PLATFORM =~ /java/ require 'java' java.lang.System.gc else GC.start end end end def assert_indent amount, doc, message = nil nodes = [] doc.traverse do |node| nodes << node if node.text? && node.blank? end assert nodes.length > 0 nodes.each do |node| len = node.content.gsub(/[\r\n]/, '').length assert_equal(0, len % amount, message) end end def util_decorate(document, decorator_module) document.decorators(XML::Node) << decorator_module document.decorators(XML::NodeSet) << decorator_module document.decorate! end # # Test::Unit backwards compatibility section # alias :assert_no_match :refute_match alias :assert_not_nil :refute_nil alias :assert_raise :assert_raises alias :assert_not_equal :refute_equal end module SAX class TestCase < Nokogiri::TestCase class Doc < XML::SAX::Document attr_reader :start_elements, :start_document_called attr_reader :end_elements, :end_document_called attr_reader :data, :comments, :cdata_blocks, :start_elements_namespace attr_reader :errors, :warnings, :end_elements_namespace attr_reader :xmldecls attr_reader :processing_instructions def xmldecl version, encoding, standalone @xmldecls = [version, encoding, standalone].compact super end def start_document @start_document_called = true super end def end_document @end_document_called = true super end def error error (@errors ||= []) << error super end def warning warning (@warning ||= []) << warning super end def start_element *args (@start_elements ||= []) << args super end def start_element_namespace *args (@start_elements_namespace ||= []) << args super end def end_element *args (@end_elements ||= []) << args super end def end_element_namespace *args (@end_elements_namespace ||= []) << args super end def characters string @data ||= [] @data += [string] super end def comment string @comments ||= [] @comments += [string] super end def cdata_block string @cdata_blocks ||= [] @cdata_blocks += [string] super end def processing_instruction name, content @processing_instructions ||= [] @processing_instructions << [name, content] end end end end end nokogiri-1.6.1/test/test_nokogiri.rb0000644000175000017500000000710412261213762017106 0ustar boutilboutilrequire "helper" class TestNokogiri < Nokogiri::TestCase def test_versions version_match = /\d+\.\d+\.\d+/ assert_match version_match, Nokogiri::VERSION assert_equal Nokogiri::VERSION_INFO['ruby']['version'], ::RUBY_VERSION assert_equal Nokogiri::VERSION_INFO['ruby']['platform'], ::RUBY_PLATFORM if Nokogiri.uses_libxml? assert_match version_match, Nokogiri::LIBXML_VERSION assert_equal 'extension', Nokogiri::VERSION_INFO['libxml']['binding'] assert_match version_match, Nokogiri::VERSION_INFO['libxml']['compiled'] assert_equal Nokogiri::LIBXML_VERSION, Nokogiri::VERSION_INFO['libxml']['compiled'] assert_match version_match, Nokogiri::VERSION_INFO['libxml']['loaded'] Nokogiri::LIBXML_PARSER_VERSION =~ /(\d)(\d{2})(\d{2})/ major = $1.to_i minor = $2.to_i bug = $3.to_i assert_equal "#{major}.#{minor}.#{bug}", Nokogiri::VERSION_INFO['libxml']['loaded'] end end def test_libxml_iconv assert Nokogiri.const_defined?(:LIBXML_ICONV_ENABLED) if Nokogiri.uses_libxml? end def test_parse_with_io doc = Nokogiri.parse( StringIO.new("") ) assert_instance_of Nokogiri::HTML::Document, doc end def test_xml? doc = Nokogiri.parse(File.read(XML_FILE)) assert doc.xml? assert !doc.html? end def test_html? doc = Nokogiri.parse(File.read(HTML_FILE)) assert !doc.xml? assert doc.html? end def test_nokogiri_method_with_html doc1 = Nokogiri(File.read(HTML_FILE)) doc2 = Nokogiri.parse(File.read(HTML_FILE)) assert_equal doc1.serialize, doc2.serialize end def test_nokogiri_method_with_block doc = Nokogiri { b "bold tag" } assert_equal('bold tag', doc.to_html.chomp) end def test_make_with_html doc = Nokogiri.make("bold tag") assert_equal('bold tag', doc.to_html.chomp) end def test_make_with_block doc = Nokogiri.make { b "bold tag" } assert_equal('bold tag', doc.to_html.chomp) end SLOP_HTML = <<-END
  • one
  • two
one
div two
END def test_slop_css doc = Nokogiri::Slop(<<-eohtml)
one
div two
div three
eohtml assert_equal "div", doc.html.body.div.div('.foo').name end def test_slop doc = Nokogiri::Slop(SLOP_HTML) assert_equal "one", doc.html.body.ul.li.first.text assert_equal "two", doc.html.body.ul.li(".blue").text assert_equal "div two", doc.html.body.div.div.text assert_equal "two", doc.html.body.ul.li(:css => ".blue").text assert_equal "two", doc.html.body.ul.li(:xpath => "position()=2").text assert_equal "one", doc.html.body.ul.li(:xpath => ["contains(text(),'o')"]).first.text assert_equal "two", doc.html.body.ul.li(:xpath => ["contains(text(),'o')","contains(text(),'t')"]).text assert_raise(NoMethodError) { doc.nonexistent } end def test_slop_decorator doc = Nokogiri(SLOP_HTML) assert !doc.decorators(Nokogiri::XML::Node).include?(Nokogiri::Decorators::Slop) doc.slop! assert doc.decorators(Nokogiri::XML::Node).include?(Nokogiri::Decorators::Slop) doc.slop! assert_equal 1, doc.decorators(Nokogiri::XML::Node).select { |d| d == Nokogiri::Decorators::Slop }.size end end nokogiri-1.6.1/test/test_encoding_handler.rb0000644000175000017500000000226612261213762020554 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" class TestEncodingHandler < Nokogiri::TestCase def teardown Nokogiri::EncodingHandler.clear_aliases! end def test_get assert_not_nil Nokogiri::EncodingHandler['UTF-8'] assert_nil Nokogiri::EncodingHandler['alsdkjfhaldskjfh'] end def test_name eh = Nokogiri::EncodingHandler['UTF-8'] assert_equal "UTF-8", eh.name end def test_alias Nokogiri::EncodingHandler.alias('UTF-8', 'UTF-18') assert_equal 'UTF-8', Nokogiri::EncodingHandler['UTF-18'].name end def test_cleanup_aliases assert_nil Nokogiri::EncodingHandler['UTF-9'] Nokogiri::EncodingHandler.alias('UTF-8', 'UTF-9') assert_not_nil Nokogiri::EncodingHandler['UTF-9'] Nokogiri::EncodingHandler.clear_aliases! assert_nil Nokogiri::EncodingHandler['UTF-9'] end def test_delete assert_nil Nokogiri::EncodingHandler['UTF-9'] Nokogiri::EncodingHandler.alias('UTF-8', 'UTF-9') assert_not_nil Nokogiri::EncodingHandler['UTF-9'] Nokogiri::EncodingHandler.delete 'UTF-9' assert_nil Nokogiri::EncodingHandler['UTF-9'] end def test_delete_non_existent assert_nil Nokogiri::EncodingHandler.delete('UTF-9') end end nokogiri-1.6.1/test/test_reader.rb0000644000175000017500000003732312261213762016535 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" class TestReader < Nokogiri::TestCase def test_from_io_sets_io_as_source io = File.open SNUGGLES_FILE reader = Nokogiri::XML::Reader.from_io(io) assert_equal io, reader.source end def test_empty_element? reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) Paris eoxml results = reader.map do |node| if node.node_type == Nokogiri::XML::Node::ELEMENT_NODE node.empty_element? end end assert_equal [false, false, nil, nil, true, nil], results end def test_self_closing? reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) Paris eoxml results = reader.map do |node| if node.node_type == Nokogiri::XML::Node::ELEMENT_NODE node.self_closing? end end assert_equal [false, false, nil, nil, true, nil], results end # Issue #831 # Make sure that the reader doesn't block reading the entire input def test_reader_blocking rd, wr = IO.pipe() node_out = nil t = Thread.start do reader = Nokogiri::XML::Reader(rd, 'UTF-8') reader.each do |node| node_out = node break end end sleep(1) # sleep for one second to make sure the reader will actually block for input wr.puts "" wr.puts "" * 10000 wr.flush res = t.join(5) # wait 5 seconds for the thread to finish wr.close rd.close refute_nil node_out, "Didn't read any nodes, exclude the trivial case" refute_nil res, "Reader blocks trying to read the entire stream" end def test_reader_takes_block options = nil Nokogiri::XML::Reader(File.read(XML_FILE), XML_FILE) do |cfg| options = cfg options.nonet.nowarning.dtdattr end assert options.nonet? assert options.nowarning? assert options.dtdattr? end def test_nil_raises assert_raises(ArgumentError) { Nokogiri::XML::Reader.from_memory(nil) } assert_raises(ArgumentError) { Nokogiri::XML::Reader.from_io(nil) } end def test_from_io io = File.open SNUGGLES_FILE reader = Nokogiri::XML::Reader.from_io(io) assert_equal false, reader.default? assert_equal [false, false, false, false, false, false, false], reader.map { |x| x.default? } end def test_io io = File.open SNUGGLES_FILE reader = Nokogiri::XML::Reader(io) assert_equal false, reader.default? assert_equal [false, false, false, false, false, false, false], reader.map { |x| x.default? } end def test_string_io io = StringIO.new(<<-eoxml) snuggles! eoxml reader = Nokogiri::XML::Reader(io) assert_equal false, reader.default? assert_equal [false, false, false, false, false, false, false], reader.map { |x| x.default? } end class ReallyBadIO def read(size) 'a' * size ** 10 end end class ReallyBadIO4Java def read(size=1) 'a' * size ** 10 end end def test_io_that_reads_too_much if Nokogiri.jruby? io = ReallyBadIO4Java.new Nokogiri::XML::Reader(io) else io = ReallyBadIO.new Nokogiri::XML::Reader(io) end end def test_in_memory assert Nokogiri::XML::Reader(<<-eoxml) snuggles! eoxml end def test_reader_holds_on_to_string xml = <<-eoxml snuggles! eoxml reader = Nokogiri::XML::Reader(xml) assert_equal xml, reader.source end def test_default? reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_equal false, reader.default? assert_equal [false, false, false, false, false, false, false], reader.map { |x| x.default? } end def test_value? reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_equal false, reader.value? assert_equal [false, true, false, true, false, true, false], reader.map { |x| x.value? } end def test_read_error_document reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_raises(Nokogiri::XML::SyntaxError) do reader.each { |node| } end assert 1, reader.errors.length end def test_attributes? reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_equal false, reader.attributes? assert_equal [true, false, true, false, true, false, true], reader.map { |x| x.attributes? } end def test_attributes reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_equal({}, reader.attributes) assert_equal [{'xmlns:tenderlove'=>'http://tenderlovemaking.com/', 'xmlns'=>'http://mothership.connection.com/'}, {}, {"awesome"=>"true"}, {}, {"awesome"=>"true"}, {}, {'xmlns:tenderlove'=>'http://tenderlovemaking.com/', 'xmlns'=>'http://mothership.connection.com/'}], reader.map { |x| x.attributes } end def test_attribute_roundtrip reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml reader.each do |node| node.attributes.each do |key, value| assert_equal value, node.attribute(key) end end end def test_attribute_at reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_nil reader.attribute_at(nil) assert_nil reader.attribute_at(0) assert_equal ['http://tenderlovemaking.com/', nil, 'true', nil, 'true', nil, 'http://tenderlovemaking.com/'], reader.map { |x| x.attribute_at(0) } end def test_attribute reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_nil reader.attribute(nil) assert_nil reader.attribute('awesome') assert_equal [nil, nil, 'true', nil, 'true', nil, nil], reader.map { |x| x.attribute('awesome') } end def test_attribute_length reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_equal 0, reader.attribute_count assert_equal [1, 0, 1, 0, 0, 0, 0], reader.map { |x| x.attribute_count } end def test_depth reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_equal 0, reader.depth assert_equal [0, 1, 1, 2, 1, 1, 0], reader.map { |x| x.depth } end def test_encoding string = <<-eoxml

The quick brown fox jumps over the lazy dog.

日本語が上手です

eoxml reader = Nokogiri::XML::Reader.from_memory(string, nil, 'UTF-8') assert_equal ['UTF-8'], reader.map { |x| x.encoding }.uniq end def test_xml_version reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_nil reader.xml_version assert_equal ['1.0'], reader.map { |x| x.xml_version }.uniq end def test_lang reader = Nokogiri::XML::Reader.from_memory(<<-eoxml)

The quick brown fox jumps over the lazy dog.

日本語が上手です

eoxml assert_nil reader.lang assert_equal [nil, nil, "en", "en", "en", nil, "ja", "ja", "ja", nil, nil], reader.map { |x| x.lang } end def test_value reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) snuggles! eoxml assert_nil reader.value assert_equal [nil, "\n ", nil, "snuggles!", nil, "\n ", nil], reader.map { |x| x.value } end def test_prefix reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) hello eoxml assert_nil reader.prefix assert_equal [nil, nil, "edi", nil, "edi", nil, nil], reader.map { |n| n.prefix } end def test_node_type reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) hello eoxml assert_equal 0, reader.node_type assert_equal [1, 14, 1, 3, 15, 14, 15], reader.map { |n| n.node_type } end def test_inner_xml str = "hello" reader = Nokogiri::XML::Reader.from_memory(str) reader.read assert_equal "hello", reader.inner_xml end def test_outer_xml str = ["hello", "hello", "hello", "", ""] reader = Nokogiri::XML::Reader.from_memory(str.first) xml = [] reader.map { |node| xml << node.outer_xml } assert_equal str, xml end def test_outer_xml_with_empty_nodes str = ["", "", ""] reader = Nokogiri::XML::Reader.from_memory(str.first) xml = [] reader.map { |node| xml << node.outer_xml } assert_equal str, xml end def test_state reader = Nokogiri::XML::Reader.from_memory('bar
') assert reader.state end def test_ns_uri reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) hello eoxml assert_nil reader.namespace_uri assert_equal([nil, nil, "http://ecommerce.example.org/schema", nil, "http://ecommerce.example.org/schema", nil, nil], reader.map { |n| n.namespace_uri }) end def test_local_name reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) hello eoxml assert_nil reader.local_name assert_equal(["x", "#text", "foo", "#text", "foo", "#text", "x"], reader.map { |n| n.local_name }) end def test_name reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) hello eoxml assert_nil reader.name assert_equal(["x", "#text", "edi:foo", "#text", "edi:foo", "#text", "x"], reader.map { |n| n.name }) end def test_base_uri reader = Nokogiri::XML::Reader.from_memory(<<-eoxml) eoxml assert_nil reader.base_uri assert_equal(["http://base.example.org/base/", "http://base.example.org/base/", "http://base.example.org/base/", "http://base.example.org/base/", "http://other.example.org/", "http://base.example.org/base/", "http://base.example.org/base/relative", "http://base.example.org/base/relative", "http://base.example.org/base/relative", "http://base.example.org/base/relative", "http://base.example.org/base/relative", "http://base.example.org/base/", "http://base.example.org/base/"], reader.map {|n| n.base_uri }) end def test_xlink_href_without_base_uri reader = Nokogiri::XML::Reader(<<-eoxml) Link Linked Element eoxml reader.each do |node| if node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT if node.name == 'link' assert_nil node.base_uri end end end end def test_xlink_href_with_base_uri reader = Nokogiri::XML::Reader(<<-eoxml) Link Linked Element eoxml reader.each do |node| if node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT assert_equal node.base_uri, "http://base.example.org/base/" end end end def test_read_from_memory called = false reader = Nokogiri::XML::Reader.from_memory('bar') reader.each do |node| called = true assert node end assert called end def test_large_document_smoke_test # simply run on a large document to verify that there no GC issues xml = [] xml << "" 10000.times { |j| xml << "" } xml << "" xml = xml.join("\n") Nokogiri::XML::Reader.from_memory(xml).each do |e| e.attributes end end def test_correct_outer_xml_inclusion xml = Nokogiri::XML::Reader.from_io(StringIO.new(<<-eoxml)) child-1 child-2 child-3 eoxml nodelengths = [] has_child2 = [] xml.each do |node| if node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT and node.name == "child" nodelengths << node.outer_xml.length has_child2 << !!(node.outer_xml =~ /child-2/) end end assert_equal(nodelengths[0], nodelengths[1]) assert(has_child2[1]) assert(!has_child2[0]) end def test_correct_inner_xml_inclusion xml = Nokogiri::XML::Reader.from_io(StringIO.new(<<-eoxml)) child-1 child-2 child-3 eoxml nodelengths = [] has_child2 = [] xml.each do |node| if node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT and node.name == "child" nodelengths << node.inner_xml.length has_child2 << !!(node.inner_xml =~ /child-2/) end end assert_equal(nodelengths[0], nodelengths[1]) assert(has_child2[1]) assert(!has_child2[0]) end end nokogiri-1.6.1/test/namespaces/0000755000175000017500000000000012261213762016016 5ustar boutilboutilnokogiri-1.6.1/test/namespaces/test_namespaces_in_builder_doc.rb0000644000175000017500000000511512261213762024544 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestNamespacesInBuilderDoc < Nokogiri::TestCase def setup super b = Nokogiri::XML::Builder.new do |x| x.fruit(:xmlns => 'ns:fruit', :'xmlns:veg' => 'ns:veg', :'xmlns:xlink' => 'http://www.w3.org/1999/xlink') do x.pear { x.bosc } x.orange x[:veg].carrot do x.cheese(:xmlns => 'ns:dairy', :'xlink:href' => 'http://example.com/cheese/') end x[:meat].bacon(:'xmlns:meat' => 'ns:meat') do x.apple :count => 2 x[:veg].tomato end end end @doc = b.doc end def check_namespace e e.namespace.nil? ? nil : e.namespace.href end def test_builder_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root) end def test_builder_parent_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[0]) assert_equal 'ns:fruit', check_namespace(@doc.root.elements[1]) end def test_builder_grandparent_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[0].elements[0]) end def test_builder_parent_nondefault_ns assert_equal 'ns:veg', check_namespace(@doc.root.elements[2]) end def test_builder_single_decl_ns_1 assert_equal 'ns:dairy', check_namespace(@doc.root.elements[2].elements[0]) end def test_builder_nondefault_attr_ns assert_equal 'http://www.w3.org/1999/xlink', check_namespace(@doc.root.elements[2].elements[0].attribute_nodes.find { |a| a.name =~ /href/ }) end def test_builder_single_decl_ns_2 assert_equal 'ns:meat', check_namespace(@doc.root.elements[3]) end def test_builder_buried_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[3].elements[0]) end def test_builder_buried_decl_ns assert_equal 'ns:veg', check_namespace(@doc.root.elements[3].elements[1]) end def test_builder_namespace_count n = @doc.root.clone n.children.each(&:remove) ns_attrs = n.to_xml.scan(/\bxmlns(?::.+?)?=/) assert_equal 3, ns_attrs.length end def test_builder_namespaced_attribute_on_unparented_node doc = Nokogiri::XML::Builder.new do |x| x.root('xmlns:foo' => 'http://foo.io') { x.obj('foo:attr' => 'baz') } end.doc assert_equal 'http://foo.io', doc.root.children.first.attribute_nodes.first.namespace.href end end end end nokogiri-1.6.1/test/namespaces/test_namespaces_in_created_doc.rb0000644000175000017500000000551212261213762024526 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestNamespacesInCreatedDoc < Nokogiri::TestCase def setup super @doc = Nokogiri::XML('') pear = @doc.create_element('pear') bosc = @doc.create_element('bosc') pear.add_child(bosc) @doc.root << pear @doc.root.add_child('') carrot = @doc.create_element('veg:carrot') @doc.root << carrot cheese = @doc.create_element('cheese', :xmlns => 'ns:dairy', :'xlink:href' => 'http://example.com/cheese/') carrot << cheese bacon = @doc.create_element('meat:bacon', :'xmlns:meat' => 'ns:meat') apple = @doc.create_element('apple') apple['count'] = 2 bacon << apple tomato = @doc.create_element('veg:tomato') bacon << tomato @doc.root << bacon end def check_namespace e e.namespace.nil? ? nil : e.namespace.href end def test_created_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root) end def test_created_parent_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[0]) assert_equal 'ns:fruit', check_namespace(@doc.root.elements[1]) end def test_created_grandparent_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[0].elements[0]) end def test_created_parent_nondefault_ns assert_equal 'ns:veg', check_namespace(@doc.root.elements[2]) end def test_created_single_decl_ns_1 assert_equal 'ns:dairy', check_namespace(@doc.root.elements[2].elements[0]) end def test_created_nondefault_attr_ns assert_equal 'http://www.w3.org/1999/xlink', check_namespace(@doc.root.elements[2].elements[0].attribute_nodes.find { |a| a.name =~ /href/ }) end def test_created_single_decl_ns_2 assert_equal 'ns:meat', check_namespace(@doc.root.elements[3]) end def test_created_buried_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[3].elements[0]) end def test_created_buried_decl_ns assert_equal 'ns:veg', check_namespace(@doc.root.elements[3].elements[1]) end def test_created_namespace_count n = @doc.root.clone n.children.each(&:remove) ns_attrs = n.to_xml.scan(/\bxmlns(?::.+?)?=/) assert_equal 3, ns_attrs.length end def test_created_namespaced_attribute_on_unparented_node doc = Nokogiri::XML('') node = @doc.create_element('obj', 'foo:attr' => 'baz') doc.root.add_child(node) assert_equal 'http://foo.io', doc.root.children.first.attribute_nodes.first.namespace.href end end end end nokogiri-1.6.1/test/namespaces/test_additional_namespaces_in_builder_doc.rb0000644000175000017500000000055712261213762026741 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestAdditionalNamespacesInBuilderDoc < Nokogiri::TestCase def test_builder_namespaced_root_node_ns b = Nokogiri::XML::Builder.new do |x| x[:foo].RDF(:'xmlns:foo' => 'http://foo.io') end assert_equal 'http://foo.io', b.doc.root.namespace.href end end end end nokogiri-1.6.1/test/namespaces/test_namespaces_in_parsed_doc.rb0000644000175000017500000000436612261213762024403 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestNamespacesInParsedDoc < Nokogiri::TestCase def setup super @doc = Nokogiri::XML <<-eoxml eoxml end def check_namespace e e.namespace.nil? ? nil : e.namespace.href end def test_parsed_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root) end def test_parsed_parent_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[0]) assert_equal 'ns:fruit', check_namespace(@doc.root.elements[1]) end def test_parsed_grandparent_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[0].elements[0]) end def test_parsed_parent_nondefault_ns assert_equal 'ns:veg', check_namespace(@doc.root.elements[2]) end def test_parsed_single_decl_ns_1 assert_equal 'ns:dairy', check_namespace(@doc.root.elements[2].elements[0]) end def test_parsed_nondefault_attr_ns assert_equal 'http://www.w3.org/1999/xlink', check_namespace(@doc.root.elements[2].elements[0].attribute_nodes.find { |a| a.name =~ /href/ }) end def test_parsed_single_decl_ns_2 assert_equal 'ns:meat', check_namespace(@doc.root.elements[3]) end def test_parsed_buried_default_ns assert_equal 'ns:fruit', check_namespace(@doc.root.elements[3].elements[0]) end def test_parsed_buried_decl_ns assert_equal 'ns:veg', check_namespace(@doc.root.elements[3].elements[1]) end def test_parsed_namespace_count n = @doc.root.clone n.children.each(&:remove) ns_attrs = n.to_xml.scan(/\bxmlns(?::.+?)?=/) assert_equal 3, ns_attrs.length end end end end nokogiri-1.6.1/test/test_memory_leak.rb0000644000175000017500000000711312261213762017571 0ustar boutilboutilrequire "helper" class TestMemoryLeak < Nokogiri::TestCase def setup super @str = <
EOF end if ENV['NOKOGIRI_GC'] # turning these off by default for now def test_dont_hurt_em_why content = File.open("#{File.dirname(__FILE__)}/files/dont_hurt_em_why.xml").read ndoc = Nokogiri::XML(content) 2.times do ndoc.search('status text').first.inner_text ndoc.search('user name').first.inner_text GC.start end end class BadIO def read(*args) raise 'hell' end def write(*args) raise 'chickens' end end def test_for_mem_leak_on_io_callbacks io = File.open SNUGGLES_FILE Nokogiri::XML.parse(io) loop do Nokogiri::XML.parse(BadIO.new) rescue nil doc.write BadIO.new rescue nil end end def test_for_memory_leak begin # we don't use Dike in any tests, but requiring it has side effects # that can create memory leaks, and that's what we're testing for. require 'rubygems' require 'dike' # do not remove! count_start = count_object_space_documents xml_data = <<-EOS abc 1234 Zzz EOS 20.times do doc = Nokogiri::XML(xml_data) doc.xpath("//item") end 2.times { GC.start } count_end = count_object_space_documents assert((count_end - count_start) <= 2, "memory leak detected") rescue LoadError puts "\ndike is not installed, skipping memory leak test" end end def test_node_set_namespace_mem_leak xml = Nokogiri::XML "" ctx = Nokogiri::XML::XPathContext.new(xml) loop do ctx.evaluate("//namespace::*") end end def test_leak_on_node_replace loop do doc = Nokogiri.XML("") n = Nokogiri::XML::CDATA.new(doc, "bar") pivot = doc.root.children[0] pivot.replace(n) end end def test_sax_parser_context io = StringIO.new(@str) loop do Nokogiri::XML::SAX::ParserContext.new(@str) Nokogiri::XML::SAX::ParserContext.new(io) io.rewind Nokogiri::HTML::SAX::ParserContext.new(@str) Nokogiri::HTML::SAX::ParserContext.new(io) io.rewind end end class JumpingSaxHandler < Nokogiri::XML::SAX::Document def initialize(jumptag) @jumptag = jumptag super() end def start_element(name, attrs = []) throw @jumptag end end def test_jumping_sax_handler doc = JumpingSaxHandler.new(:foo) loop do catch(:foo) do Nokogiri::HTML::SAX::Parser.new(doc).parse(@str) end end end def test_in_context_parser_leak loop do doc = Nokogiri::XML::Document.new fragment1 = Nokogiri::XML::DocumentFragment.new(doc, '') node = fragment1.children[0] node.parse('') end end def test_in_context_parser_leak_ii loop { Nokogiri::XML('').root.parse('') } end def test_leak_on_xpath_string_function doc = Nokogiri::XML(@str) loop do doc.xpath('name(//node())') end end end # if NOKOGIRI_GC private def count_object_space_documents count = 0 ObjectSpace.each_object {|j| count += 1 if j.is_a?(Nokogiri::XML::Document) } count end end nokogiri-1.6.1/test/files/0000755000175000017500000000000012261213762015001 5ustar boutilboutilnokogiri-1.6.1/test/files/exslt.xml0000644000175000017500000000017512261213762016665 0ustar boutilboutil 1 3 2 nokogiri-1.6.1/test/files/xinclude.xml0000644000175000017500000000021612261213762017335 0ustar boutilboutil nokogiri-1.6.1/test/files/staff.xslt0000644000175000017500000000142412261213762017021 0ustar boutilboutil

Employee ID Name Position Salary
nokogiri-1.6.1/test/files/noencoding.html0000644000175000017500000001362412261213762020020 0ustar boutilboutil I have no encoding declaration.

I want one.

I really want one.

I really really want one.

I really really really want one.

I really really really really want one.

I really really really really really want one.

I really really really really really really want one.

I really really really really really really really want one.

I really really really really really really really really want one.

I really really really really really really really really really want one.

I really really really really really really really really really really want one.

I really really really really really really really really really really really want one.

I really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

I really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really really want one.

nokogiri-1.6.1/test/files/exslt.xslt0000644000175000017500000000207012261213762017053 0ustar boutilboutil nokogiri-1.6.1/test/files/encoding.xhtml0000644000175000017500000001633212261213762017652 0ustar boutilboutil Ă

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

nokogiri-1.6.1/test/files/shift_jis.xml0000644000175000017500000000020012261213762017475 0ustar boutilboutil This is a Shift_JIS File ɂ nokogiri-1.6.1/test/files/bogus.xml0000644000175000017500000000000012261213762016630 0ustar boutilboutilnokogiri-1.6.1/test/files/test_document_url/0000755000175000017500000000000012261213762020540 5ustar boutilboutilnokogiri-1.6.1/test/files/test_document_url/document.xml0000644000175000017500000000017712261213762023105 0ustar boutilboutil &bar; nokogiri-1.6.1/test/files/test_document_url/document.dtd0000644000175000017500000000014312261213762023051 0ustar boutilboutil nokogiri-1.6.1/test/files/test_document_url/bar.xml0000644000175000017500000000007412261213762022027 0ustar boutilboutil foobar nokogiri-1.6.1/test/files/staff.dtd0000644000175000017500000000064212261213762016603 0ustar boutilboutil Element data"> nokogiri-1.6.1/test/files/address_book.xml0000644000175000017500000000030012261213762020153 0ustar boutilboutil John Smith js@example.com Fred Bloggs fb@example.net nokogiri-1.6.1/test/files/address_book.rlx0000644000175000017500000000042512261213762020170 0ustar boutilboutil nokogiri-1.6.1/test/files/to_be_xincluded.xml0000644000175000017500000000015512261213762020653 0ustar boutilboutil this snippet is to be included from xinclude.xml nokogiri-1.6.1/test/files/staff.xml0000644000175000017500000000375512261213762016640 0ustar boutilboutil Element data"> ]> EMP0001 Margaret Martin Accountant 56,000 Female
1230 North Ave. Dallas, Texas 98551
EMP0002 Martha Raynolds Secretary 35,000 Female
&ent2; Dallas, &ent3; 98554
EMP0003 Roger Jones Department Manager 100,000 &ent4;
PO Box 27 Irving, texas 98553
EMP0004 Jeny Oconnor Personnel Director 95,000 Female
27 South Road. Dallas, Texas 98556
EMP0005 Robert Myers Computer Specialist 90,000 male
1821 Nordic. Road, Irving Texas 98558
nokogiri-1.6.1/test/files/shift_jis.html0000644000175000017500000000034112261213762017647 0ustar boutilboutil ɂ́I

This is a Shift_JIS File

ɂ́I

nokogiri-1.6.1/test/files/dont_hurt_em_why.xml0000644000175000017500000004451312261213762021110 0ustar boutilboutil Sat Aug 09 05:38:12 +0000 2008 882281424 I so just thought the guy lighting the Olympic torch was falling when he began to run on the wall. Wow that would have been catastrophic. web false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Sat Aug 09 02:04:56 +0000 2008 882145663 @ijonas - wow that stuff sounds sweet web false 882005142 1000471 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Sat Aug 09 01:36:41 +0000 2008 882126691 Steph is driving Sally for the first time. I'm proud of her. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Fri Aug 08 22:21:21 +0000 2008 881987762 @ijonas - what are you making with couchdb, ruby and httparty? web false 881947237 1000471 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Fri Aug 08 14:20:14 +0000 2008 881535796 @oaknd1 - delete it off phone and iTunes. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 881526234 3038211 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Fri Aug 08 14:07:29 +0000 2008 881522394 Listening to U2 "beautiful day" in honor of it being such a beautiful day. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Fri Aug 08 14:06:44 +0000 2008 881521592 @lizsmc1 - hi! (just passed her on the road) <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 881519558 11485452 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Fri Aug 08 13:59:35 +0000 2008 881514030 Beautiful day for a motorcycle ride. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Fri Aug 08 13:45:21 +0000 2008 881499439 @lizamc1 - no way! Politos will be missed. I remember eating a whole garlic pizza there with Joe. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Fri Aug 08 13:41:50 +0000 2008 881496024 Riding my motorcyle to campus. Using the library basement for a meeting. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Thu Aug 07 22:52:20 +0000 2008 880896190 Scraping super glue off my finger with a knife. web false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Thu Aug 07 22:14:35 +0000 2008 880866160 So cold...Starbucks north side, you win the day, but I shall bring a sweatshirt to our next battle! <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Thu Aug 07 17:25:40 +0000 2008 880610064 Headed home for a bit to get my headphones and then to the north side to meet @orderedlist. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 485 Thu Aug 07 17:15:59 +0000 2008 880600278 Panera wifi, why do you hate me? <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 485 Thu Aug 07 15:46:25 +0000 2008 880509577 At panera. Turned my alarm off this morning and woke up late. Oh well. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 485 Thu Aug 07 15:01:35 +0000 2008 880463746 @kloh I remember days like that. Exhausting. Also, NullRiver said to sync again, uninstall app from phone and then resync app. Evidently ... web true 880432701 10193732 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 485 Thu Aug 07 02:52:38 +0000 2008 879986739 @kloh - I haven't updated my OS so I'm wondering how apple could have made the app stop working. I did contact NullRiver support just no ... web true 879980813 10193732 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Thu Aug 07 02:38:58 +0000 2008 879976045 @jerry - i went with the pdf download. web false 879878196 613 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Thu Aug 07 02:31:26 +0000 2008 879969851 @kloh - it worked at home for multiple tests and just stopped when I acyltually needed it. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 879968483 10193732 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 Thu Aug 07 01:18:28 +0000 2008 879913748 Netshare will no longer start up for me. Of course it borks the first time I actually want to use it. <a href="http://help.twitter.com/index.php?pg=kb.page&id=75">txt</a> false 4243 John Nunemaker jnunemaker Mishawaka, IN, US Loves his wife, ruby, notre dame football and iu basketball http://s3.amazonaws.com/twitter_production/profile_images/53781608/Photo_75_normal.jpg http://addictedtonew.com false 486 nokogiri-1.6.1/test/files/valid_bar.xml0000644000175000017500000000005712261213762017450 0ustar boutilboutil nokogiri-1.6.1/test/files/2ch.html0000644000175000017500000001042712261213762016347 0ustar boutilboutil Q˂f‚ւ悤
nokogiri-1.6.1/test/files/tlm.html0000644000175000017500000021061012261213762016463 0ustar boutilboutil Tender Lovemaking

Back Home!

I'm finally back home. I went to Japan a few weeks ago for vacation, and I also spoke at Ruby Kaigi 2008. Ruby Kaigi was so much fun! I've been studying Japanese for a little over a year, but I've never been to Japan. It was exciting and fun to talk to people, and I made a bunch of new Japanese friends. I'd really like to thank Leonard Chin for helping out at the Kaigi. My language skills aren't good enough, and he was kind enough to fill in the gaps. Thank you!

While I was in Japan, I noticed QR Codes everywhere. QR Codes are basically really awesome bar codes. They can hold much more information in a smaller amount of space. They can be easily decoded from images taken with digital cameras. They have these codes everywhere in Japan, and the idea is that people can take a photo with the camera on their cell phone, then the phone decodes the QR Code. I believe most of the QR codes contain information about the company, or possibly a URL to the company's website.

The company that created the format says that the format is open, but unfortunately I have to pay for the spec. I can download the spec in Japanese for free, but my Japanese isn't that good! So unfortunately I'm stuck with either the ISO spec (which is over $200) or the AIM spec ($85). I don't understand why they are so expensive..... I think I'll buy the AIM one, and hope that it is the same as the ISO one.

Written by Aaron PattersonPermalinkComments (1)Leave your Comment »

Meow meow meow meow meow

The other day I wrote an app called dejour to give me growl notifications from all the *jour gems out there. I used Eric Hodel's awesome ruby-growl library. Unfortunately it does all communications over the interweb, so you have to tweak some knobs in Growl to get it to work. I stumbled across a ruby/cocoa example using Growl, fixed it up, and released a gem called "Meow".

Meow lets you post notifications to your local machine without adjusting Growl. If you're on OS X 10.5, just do:

$ gem install meow

Then you can do this:

$ ruby -r rubygems -e'require "meow"; Meow.notify("meow", "meow", "meow")'

No growl tweaks required! Here is a code sample that is a little more explanatory:

require 'rubygems'
require 'meow'

meep = Meow.new('My Application Name')
meep.notify('Message Title', 'Message Description')

Be sure to check out the documentation.

Written by Aaron PattersonPermalinkComments (4)Leave your Comment »

Write your Rails view in……. JavaScript?

In my last post about Johnson, I said that next time I would talk about the JavaScript parse tree that Johnson provides. Well, I changed my mind. Sorry.

I want to write about a rails plugin that I added to Johnson. Brohuda Katz wrote an ERb type parser in JavaScript, and added it to the (yet to be released) Johnson distribution. With that in mind, and looking at the new template handlers in edge rails, I was able to throw together a rails plugin that allows me to use JavaScript in my rails view code.

Lets get to the code. Here is my controller:

class JohnsonController < ApplicationController
  def index
    @users = User.find(:all)
  end
end

And my EJS view (the file is named index.html.ejs):

<% for(var user in at.users) { %>
  <%= user.first_name() %><br />
<% } %>

The johnson rails plugin puts controller instance variables in to a special javascript variable called "at". The "at" variable is actually a proxy to the controller, lazily fetching instance variables from the controller and importing those objects in to javascript land.

Lets take a look at the plugin, its only a few lines:

class EJSHandler < ActionView::TemplateHandler
  class EJSProxy # :nodoc:
    def initialize(controller)
      @controller = controller
    end

    def key?(pooperty)
      @controller.instance_variables.include?("@#{pooperty}")
    end

    def [](pooperty)
      @controller.instance_variable_get("@#{pooperty}")
    end

    def []=(pooperty, value)
      @controller.instance_variable_set("@#{pooperty}", value)
    end
  end

  def initialize(view)
    @view = view
  end

  def render(template)
    ctx = Johnson::Context.new
    ctx.evaluate('Johnson.require("johnson/template");')
    ctx['template'] = template.source
    ctx['controller'] = @view.controller
    ctx['at'] = EJSProxy.new(@view.controller)

    ctx.evaluate('Johnson.templatize(template).call(at)')
  end
end

ActionView::Template.register_template_handler("ejs", EJSHandler)

When the template gets rendered (the render method), I wrap the controller with an EJS proxy, then compile the template into a javascript function, and call that function. The "at" variable is set to the EJSProxy before executing the template, and all property accessing on the "at" variable is passed along to fetching instance variables from the controller.

Server side javascript coding in rails. Weird, eh?

Written by Aaron PattersonPermalinkComments (7)Leave your Comment »

Take it to the limit one more time

Sup bros. I need to post in this thing more often. Yesterday, someone tipped over my scooter again. I'm getting kind of tired of that.

Anyway, its time for me to write about this. RKelly is pretty much dead. For the past few months, John and I have been working on RKelly's replacement called Johnson. Basically we're now putting a ruby wrapper around Mozilla's Spidermonkey. The project is coming along quite nicely. Ruby objects can be passed in to javascript land, and javascript objects can be passed back in to ruby land.

For example, we can define an alert function in our javascript context:

require 'johnson'

ctx = Johnson::Context.new
ctx['alert'] = lambda { |x| puts x }
ctx.evaluate('alert("Hello world!");')

Johnson::Context#evaluate will also return the last statement evaluated. We can evaluate an expression, and manipulate that expression in ruby land. For example, I'll create an object in javascript, return it to ruby land, then access a property of the javascript object:

require 'johnson'

ctx = Johnson::Context.new
obj = ctx.evaluate('var foo = { x: "hello world" }; foo')
puts obj.x  # => 'hello world'

We can even do the reverse by stuffing ruby objects in to the context:

A = Struct.new(:foo)

ctx = Johnson::Context.new
ctx['alert'] = lambda { |x| puts x }
ctx['a'] = A.new("bar")
ctx.evaluate('alert(a.foo);') # => 'bar'

But it gets better. We added a top level variable called "Ruby" that lets you access constants and globals from Ruby land. We can rewrite the previous example completely in javascript:

ctx = Johnson::Context.new
ctx.evaluate("var x = new (new Ruby.Struct(Johnson.symbolize('foo')));")
ctx.evaluate("x.foo = 'bar'")
puts ctx.evaluate('x').foo # => 'bar'
puts ctx.evaluate('x').class # => #<Class:0x49714>

Since the 'Ruby' constant delegates to Object, you can access any constant. Including ones you've defined yourself. We could, for example, look up a bunch of User records through rails:

ctx = Johnson::Context.new
ctx['alert'] = lambda { |x| puts x }
ctx.evaluate(<<-END
             for(var user in Ruby.User.find(Johnson.symbolize('all'))) {
               alert(user.first_name());
             }
             END
            )

You might be wondering what this Johnson.symbolize business is about. Since Javascript doesn't have a concept of a symbol, we've created a helper to "mark" a string as a symbol and pass it back in to ruby land.

To conclude this update about my Johnson, I'd like to show off an interactive shell for Johnson (thanks to Brohuda Katz). Johnson has an interactive shell that lets you try things out in javascript land or ruby land, and let you quickly switch between the two. Typing 'js' will put in you the javascript shell, 'rb' will switch you to the ruby shell. In the ruby shell, you can use the 'cx' variable to get ahold of you javascript context:

$ ruby -I lib bin/johnson
js> var x = { foo: 'bar', hello: function() { return 'world' } };
=> nil
js> rb
rb> cx['x'].foo
=> "bar"
rb> cx['x'].hello()
=> "world"
rb>

We aren't quite ready for a release yet, but if you'd like to play around with Johnson, you can pull it down from github here. Just run 'rake', and you should have it compiled and running!

My next Johnson related post will be about Javascript parse trees and Javascript code generation.

Written by Aaron PattersonPermalinkComments (11)Leave your Comment »

New Ruby Implementation - Brobinius

Brobinius version 1.0.0 has been released!

I am happy to annouce the first release of my new fork of Ruby called
Brobinius. The goal of Brobinius is to implement new language features
that I have noticed to be completely missing.

For example, Object#tase! The tase method is featured in many other languages
but is sadly missing from Ruby. Brobinius has a fully implemented tase! method.

>> x = Class.new
=> #<Class:0x3632ec>
>> x.tase!
RuntimeError: Don't tase me bro
from (irb):2:in `tase!'
from (irb):6
>>

Brobinius also has fully serializable kittenuations. You can create and
serialize your kittenuations, then pick up your snuggling where you left off.
For example:

>> lol = Kittenuation.new {
?> look_cute
>> throw :yarn
>> look_cute
>> }
=> #<Kittenuation:0x366244>
>> lol.snuggle
>> Marshal.load(Marshal.dump(lol)).snuggle

Since Brobinius's kittenuations are serializable, you can share them over the
network with friends!

Brobinius also features screencasts with automatic YouTube uploads. All you
have to do write your program, then pass the --screencast option to Brobinius.
Brobinius will automatically create a screencast of your program and upload it
to YouTube:

$ brobinius --screencast my_code.rb

You can even add the --geoff flag to create screencasts with Geoffrey
Grosenbach doing the voice over.

Written by Aaron PattersonPermalinkComments (2)Leave your Comment »

Laser Etch My Macbook Air

Dear Lazyweb,

I would really like to Laser Etch my Macbook Air with Martha Stewart's face. Where can I get that done? How much would it cost?

Written by Aaron PattersonPermalinkComments (6)Leave your Comment »

mechanize version 0.7.5 has been released!

The Mechanize library is used for automating interaction with websites.
Mechanize automatically stores and sends cookies, follows redirects,
can follow links, and submit forms. Form fields can be populated and
submitted. Mechanize also keeps track of the sites that you have visited as
a history.

Changes:

# Mechanize CHANGELOG

## 0.7.5

Written by Aaron PattersonPermalinkComments (0)Leave your Comment »

Profiling Database Queries in Rails

Despite the recent Ruby webserver speed contests, most of the slowness at my job results from slow (or too many) database queries.

To help keep database queries down, I added a stats to every page that shows the number of queries vs. cache hits, the number of rows returned, and the amount of data transferred from the database. In this screenshot I'm using the "live" environment, 3 cache hits, 169 misses, 577 rows returned, and 458.9k data transferred. Clicking the box hides it, and clicking "Super Hide!" hides the box and sets a cookie so that the box doesn't show up again for a while.

Debug Window

To get this working, first I monkey patch the MysqlAdapter to collect database stats:

ActiveRecord::ConnectionAdapters::MysqlAdapter.module_eval do
    @@stats_queries = @@stats_bytes = @@stats_rows = 0

    def self.get_stats
      { :queries => @@stats_queries,
        :rows => @@stats_rows,
        :bytes => @@stats_bytes }
    end
    def self.reset_stats
      @@stats_queries = @@stats_bytes = @@stats_rows = 0
    end

    def select_with_stats(sql, name)
      bytes = 0
      rows = select_without_stats(sql, name)
      rows.each do |row|
        row.each do |key, value|
          bytes += key.length
          bytes += value.length if value
        end
      end
      @@stats_queries += 1
      @@stats_rows += rows.length
      @@stats_bytes += bytes
      rows
    end
    alias_method_chain :select, :stats
  end

Next I patched the QueryCache to keep track of hits and misses:

ActiveRecord::ConnectionAdapters::QueryCache.module_eval do
    @@hits = @@misses = 0

    def self.get_stats
      { :hits => @@hits,
        :misses => @@misses }
    end
    def self.reset_stats
      @@hits = @@misses = 0
    end

    def cache_sql_with_stats(sql, &block)
      if @query_cache.has_key?(sql)
        @@hits += 1
      else
        @@misses += 1
      end
      cache_sql_without_stats(sql, &block)
    end
    alias_method_chain :cache_sql, :stats
  end

Then modify ActionController to reset stats for each request:

ActionController::Base.module_eval do
    def perform_action_with_reset
      ActiveRecord::ConnectionAdapters::MysqlAdapter::reset_stats
      ActiveRecord::ConnectionAdapters::QueryCache::reset_stats
      perform_action_without_reset
    end

    alias_method_chain :perform_action, :reset

    def active_record_runtime(runtime)
      stats = ActiveRecord::ConnectionAdapters::MysqlAdapter::get_stats
      "#{super} #{sprintf("%.1fk", stats[:bytes].to_f / 1024)} queries: #{stats[:queries]}"
    end
  end

Just drop all that inside the after_initialize in your development.rb and you'll get the nice stats. After that, just create a partial that displays the stats and include the partial at the bottom of your layout. Our partial looks like this:

<% unless %w(production test).include?(RAILS_ENV) -%>
  <h4 id="debug" onclick="$(this).remove()" style="background:pink;text-align:center;position:absolute;top:16px;left:35%;padding:0.5em;border: 2px solid red;">
  <%= RAILS_ENV %>
  <br />
  <% if ActiveRecord::ConnectionAdapters::QueryCache.respond_to?(:get_stats) %>
    <% stats = ActiveRecord::ConnectionAdapters::QueryCache.get_stats %>
    Queries: <%= stats[:hits] %> / <%= stats[:misses] %> /
    <%= number_to_percentage((stats[:hits].to_f / (stats[:hits] + stats[:misses])) * 100, :precision => 0) %>
    |
  <% end %>
  <% if ActiveRecord::ConnectionAdapters::MysqlAdapter.respond_to?(:get_stats) %>
    <% stats = ActiveRecord::ConnectionAdapters::MysqlAdapter.get_stats %>
    Rows: <%= stats[:rows] %> |
    Transfer: <%= sprintf("%.1fk", stats[:bytes].to_f / 1024) %>
  <% end %>
  <p style="margin:0">
    <a style="color:magenta" href="#" onclick="superHide()">super hide!</a>
  </p>
  </h4>
  <script type="text/javascript">
    function superHide() {
      document.cookie = 'debug=hidden; path=/; domain=<%= request.host %>; max-age=14400';
    }
    if(document.cookie.indexOf('debug=hidden') != -1) {
      $('debug').hide();
    }
  </script>
<% end -%>

It's a little work, but it helps keep my mind on reducing the queries. With enough work, one of these days the speed of the webserver will matter to me. Thanks to Adam Doppelt for the basis of this monkey patch. Any bugs are mine, not his!

Written by Aaron PattersonPermalinkComments (2)Leave your Comment »

mechanize version 0.7.1 has been released!

The Mechanize library is used for automating interaction with websites.
Mechanize automatically stores and sends cookies, follows redirects,
can follow links, and submit forms. Form fields can be populated and
submitted. Mechanize also keeps track of the sites that you have visited as
a history.

Changes:

# Mechanize CHANGELOG

## 0.7.1

  • Added iPhone to the user agent aliases. [#17572]
  • Fixed a bug with EOF errors in net/http. [#17570]
  • Handling 0 length gzipped responses. [#17471]

  • http://mechanize.rubyforge.org/

Written by Aaron PattersonPermalinkComments (1)Leave your Comment »

Automated Youtube Uploads

I thought I would share the part of my twitterbrite scripts that uploads videos to Youtube. Its about 30 lines long, and took me an hour or so to write. Most of my time was spent figuring out form fields to fill out rather than writing code though....

I've broken the script down to three parts: logging in, setting the video attributes, and uploading the video file.

Step 1: Logging In

The first step is pretty simple. Just instantiate a new mechanize object, fetch youtube.com, set your login credentials, and submit!

agent = WWW::Mechanize.new { |a|
  a.user_agent_alias = 'Mac Safari'
}
page = agent.get('http://youtube.com/')

# Login
page.form('loginForm') { |f|
  f.username = 'username'
  f.password = 'password'
}.submit

Step 2: Setting video attributes

This is probably the most difficult step. Now that the agent is logged in, we have to fetch the upload page and fill out the video attributes form. You have to set the title, description, category, and keywords for your video. The you have to tell the agent to click a special button.

# Set the video attributes
page = agent.get('http://youtube.com/my_videos_upload')
form = page.form('theForm')
form.field_myvideo_title = 'My video title'
form.field_myvideo_descr = "My video description"
form.field_myvideo_categories = 28
form.field_myvideo_keywords = 'my tag'
page = form.submit(form.buttons.name('action_upload').first)

The number "28" is just the value from the category drop down list. You can iterate over the select options using mechanize, but I leave that as an exercise to the reader.

Step 3: Upload the video file

My script expects that the video file name will be supplied on the command line, so ARGV[0] should point to the file you want to upload. In this step, you simply set the video file name, then submit the form.

# Upload the video
page = page.form('theForm') { |f|
  f.file_uploads.name('field_uploadfile').first.file_name = ARGV[0]
}.submit
page.body =~ /<textarea[^>]*>(.*)<\/textarea>/m
puts $1

The last two lines grab the html needed to display the video and prints it.

There you go. Upload lots of videos now! Yay!

Written by Aaron PattersonPermalinkComments (0)Leave your Comment »

Next Page »
nokogiri-1.6.1/test/files/foo/0000755000175000017500000000000012261213762015564 5ustar boutilboutilnokogiri-1.6.1/test/files/foo/foo.xsd0000644000175000017500000000024212261213762017065 0ustar boutilboutil nokogiri-1.6.1/test/files/bar/0000755000175000017500000000000012261213762015545 5ustar boutilboutilnokogiri-1.6.1/test/files/bar/bar.xsd0000644000175000017500000000021512261213762017027 0ustar boutilboutil nokogiri-1.6.1/test/files/po.xsd0000644000175000017500000000440512261213762016142 0ustar boutilboutil Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved. nokogiri-1.6.1/test/files/encoding.html0000644000175000017500000001600312261213762017455 0ustar boutilboutil Ă

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

\zB

nokogiri-1.6.1/test/files/metacharset.html0000644000175000017500000000023412261213762020166 0ustar boutilboutil $B$?$3>F$-2>LL(B

$B0-$$;v$r9=A[Cf!#(B

nokogiri-1.6.1/test/files/snuggles.xml0000644000175000017500000000016512261213762017354 0ustar boutilboutil snuggles! nokogiri-1.6.1/test/files/saml/0000755000175000017500000000000012261213762015735 5ustar boutilboutilnokogiri-1.6.1/test/files/saml/saml20protocol_schema.xsd0000644000175000017500000003270012261213762022657 0ustar boutilboutil Document identifier: saml-schema-protocol-2.0 Location: http://docs.oasis-open.org/security/saml/v2.0/ Revision history: V1.0 (November, 2002): Initial Standard Schema. V1.1 (September, 2003): Updates within the same V1.0 namespace. V2.0 (March, 2005): New protocol schema based in a SAML V2.0 namespace. nokogiri-1.6.1/test/files/saml/xmldsig_schema.xsd0000644000175000017500000002406512261213762021453 0ustar boutilboutil ]> nokogiri-1.6.1/test/files/saml/xenc_schema.xsd0000644000175000017500000001207112261213762020733 0ustar boutilboutil ]> nokogiri-1.6.1/test/files/saml/saml20assertion_schema.xsd0000644000175000017500000003137412261213762023033 0ustar boutilboutil Document identifier: saml-schema-assertion-2.0 Location: http://docs.oasis-open.org/security/saml/v2.0/ Revision history: V1.0 (November, 2002): Initial Standard Schema. V1.1 (September, 2003): Updates within the same V1.0 namespace. V2.0 (March, 2005): New assertion schema for SAML V2.0 namespace. nokogiri-1.6.1/test/files/po.xml0000644000175000017500000000164312261213762016145 0ustar boutilboutil Alice Smith 123 Maple Street Mill Valley CA 90952 Robert Smith 8 Oak Avenue Old Town PA 95819 Hurry, my lawn is going wild! Lawnmower 1 148.95 Confirm this is electric Baby Monitor 1 39.98 1999-05-21 nokogiri-1.6.1/test/xslt/0000755000175000017500000000000012261213762014671 5ustar boutilboutilnokogiri-1.6.1/test/xslt/test_exception_handling.rb0000644000175000017500000000142212261213762022116 0ustar boutilboutilrequire "helper" module Nokogiri module XSLT class TestExceptionHandling < Nokogiri::TestCase def test_java_exception_handling skip('This test is for Java only') if Nokogiri.uses_libxml? xml = Nokogiri.XML(<<-EOXML) EOXML xsl = Nokogiri.XSLT(<<-EOXSL) EOXSL begin xsl.transform xml fail('It should not get here') rescue RuntimeError => e assert_match(/HIERARCHY_REQUEST_ERR/, e.to_s, 'The exception message does not contain the expected information') end end end end end nokogiri-1.6.1/test/xslt/test_custom_functions.rb0000644000175000017500000000660112261213762021662 0ustar boutilboutil# -*- encoding: utf-8 -*- require "helper" module Nokogiri module XSLT class TestCustomFunctions < Nokogiri::TestCase def setup super @xml = Nokogiri.XML(<<-EOXML) Foo

Foo

Lorem ipsum.

EOXML end def test_function skip("Pure Java version doesn't support this feature.") if !Nokogiri.uses_libxml? foo = Class.new do def capitalize nodes nodes.first.content.upcase end end XSLT.register "http://e.org/functions", foo xsl = Nokogiri.XSLT(<<-EOXSL) EOXSL result = xsl.transform @xml assert_equal 'FOO', result.css('title').first.text end def test_function_arguments skip("Pure Java version doesn't support this feature.") if !Nokogiri.uses_libxml? foo = Class.new do include MiniTest::Assertions # Minitest 5 uses `self.assertions` in `assert()` which is not # defined in the Minitest::Assertions module :-( attr_writer :assertions def assertions; @assertions ||= 0; end def multiarg *args assert_equal ["abc", "xyz"], args args.first end def numericarg arg assert_equal 42, arg arg end end xsl = Nokogiri.XSLT(<<-EOXSL, "http://e.org/functions" => foo) EOXSL xsl.transform @xml end def test_function_XSLT skip("Pure Java version doesn't support this feature.") if !Nokogiri.uses_libxml? foo = Class.new do def america nodes nodes.first.content.upcase end end xsl = Nokogiri.XSLT(<<-EOXSL, "http://e.org/functions" => foo) EOXSL result = xsl.transform @xml assert_equal 'FOO', result.css('title').first.text end end end end nokogiri-1.6.1/test/test_soap4r_sax.rb0000644000175000017500000000176112261213762017353 0ustar boutilboutilrequire "helper" module XSD module XMLParser class Parser @factory_added = nil class << self; attr_reader :factory_added; end def self.add_factory o @factory_added = o end def initialize *args @charset = nil end def characters foo end def start_element *args end def end_element *args end end end end require 'xsd/xmlparser/nokogiri' class TestSoap4rSax < Nokogiri::TestCase def test_factory_added assert_equal XSD::XMLParser::Nokogiri, XSD::XMLParser::Nokogiri.factory_added end def test_parse o = Class.new(::XSD::XMLParser::Nokogiri) do attr_accessor :element_started def initialize *args super @element_started = false end def start_element *args @element_started = true end end.new 'foo' o.do_parse '' assert o.element_started, 'element started' end end nokogiri-1.6.1/test/xml/0000755000175000017500000000000012261213762014477 5ustar boutilboutilnokogiri-1.6.1/test/xml/test_builder.rb0000644000175000017500000002150512261213762017514 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module XML class TestBuilder < Nokogiri::TestCase def test_attribute_sensitivity xml = Nokogiri::XML::Builder.new { |x| x.tag "hello", "abcDef" => "world" }.to_xml doc = Nokogiri.XML xml assert_equal 'world', doc.root['abcDef'] end def test_builder_with_utf8_text text = "test ﺵ " doc = Nokogiri::XML::Builder.new(:encoding => "UTF-8") { |xml| xml.test text }.doc assert_equal text, doc.content end def test_builder_escape xml = Nokogiri::XML::Builder.new { |x| x.condition "value < 1", :attr => "value < 1" }.to_xml doc = Nokogiri.XML xml assert_equal 'value < 1', doc.root['attr'] assert_equal 'value < 1', doc.root.content end def test_builder_namespace doc = Nokogiri::XML::Builder.new { |xml| xml.a("xmlns:a" => "x") do xml.b("xmlns:a" => "x", "xmlns:b" => "y") end }.doc b = doc.at('b') assert b assert_equal({"xmlns:a"=>"x", "xmlns:b"=>"y"}, b.namespaces) assert_equal({"xmlns:b"=>"y"}, namespaces_defined_on(b)) end def test_builder_namespace_part_deux doc = Nokogiri::XML::Builder.new { |xml| xml.a("xmlns:b" => "y") do xml.b("xmlns:a" => "x", "xmlns:b" => "y", "xmlns:c" => "z") end }.doc b = doc.at('b') assert b assert_equal({"xmlns:a"=>"x", "xmlns:b"=>"y", "xmlns:c"=>"z"}, b.namespaces) assert_equal({"xmlns:a"=>"x", "xmlns:c"=>"z"}, namespaces_defined_on(b)) end def test_builder_with_unlink b = Nokogiri::XML::Builder.new do |xml| xml.foo do xml.bar { xml.parent.unlink } xml.bar2 end end assert b end def test_with_root doc = Nokogiri::XML(File.read(XML_FILE)) Nokogiri::XML::Builder.with(doc.at('employee')) do |xml| xml.foo end assert_equal 1, doc.xpath('//employee/foo').length end def test_root_namespace_default_decl b = Nokogiri::XML::Builder.new { |xml| xml.root(:xmlns => 'one:two') } doc = b.doc assert_equal 'one:two', doc.root.namespace.href assert_equal({ 'xmlns' => 'one:two' }, doc.root.namespaces) end def test_root_namespace_multi_decl b = Nokogiri::XML::Builder.new { |xml| xml.root(:xmlns => 'one:two', 'xmlns:foo' => 'bar') do xml.hello end } doc = b.doc assert_equal 'one:two', doc.root.namespace.href assert_equal({ 'xmlns' => 'one:two', 'xmlns:foo' => 'bar' }, doc.root.namespaces) assert_equal 'one:two', doc.at('hello').namespace.href end def test_non_root_namespace b = Nokogiri::XML::Builder.new { |xml| xml.root { xml.hello(:xmlns => 'one') } } assert_equal 'one', b.doc.at('hello', 'xmlns' => 'one').namespace.href end def test_specify_namespace b = Nokogiri::XML::Builder.new { |xml| xml.root('xmlns:foo' => 'bar') do xml[:foo].bar xml['foo'].baz end } doc = b.doc assert_equal 'bar', doc.at('foo|bar', 'foo' => 'bar').namespace.href assert_equal 'bar', doc.at('foo|baz', 'foo' => 'bar').namespace.href end def test_dtd_in_builder_output builder = Nokogiri::XML::Builder.new do |xml| xml.doc.create_internal_subset( 'html', "-//W3C//DTD HTML 4.01 Transitional//EN", "http://www.w3.org/TR/html4/loose.dtd" ) xml.root do xml.foo end end assert_match(//, builder.to_xml) end def test_specify_namespace_nested b = Nokogiri::XML::Builder.new { |xml| xml.root('xmlns:foo' => 'bar') do xml.yay do xml[:foo].bar xml.yikes do xml['foo'].baz end end end } doc = b.doc assert_equal 'bar', doc.at('foo|bar', 'foo' => 'bar').namespace.href assert_equal 'bar', doc.at('foo|baz', 'foo' => 'bar').namespace.href end def test_specified_namespace_postdeclared doc = Nokogiri::XML::Builder.new { |xml| xml.a do xml[:foo].b("xmlns:foo" => "bar") end }.doc a = doc.at('a') assert_equal({}, a.namespaces) b = doc.at_xpath('//foo:b', {:foo=>'bar'}) assert b assert_equal({"xmlns:foo"=>"bar"}, b.namespaces) assert_equal("b", b.name) assert_equal("bar", b.namespace.href) end def test_specified_namespace_undeclared Nokogiri::XML::Builder.new { |xml| xml.root do assert_raises(ArgumentError) do xml[:foo].bar end end } end def test_set_encoding builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml| xml.root do xml.bar 'blah' end end assert_match 'UTF-8', builder.to_xml end def test_bang_and_underscore_is_escaped builder = Nokogiri::XML::Builder.new do |xml| xml.root do xml.p_('adsfadsf') xml.p!('adsfadsf') end end assert_equal 2, builder.doc.xpath('//p').length end def test_square_brackets_set_attributes builder = Nokogiri::XML::Builder.new do |xml| xml.root do foo = xml.foo foo['id'] = 'hello' assert_equal 'hello', foo['id'] end end assert_equal 1, builder.doc.xpath('//foo[@id = "hello"]').length end def test_nested_local_variable @ivar = 'hello' local_var = 'hello world' builder = Nokogiri::XML::Builder.new do |xml| xml.root do xml.foo local_var xml.bar @ivar xml.baz { xml.text @ivar } end end assert_equal 'hello world', builder.doc.at('//root/foo').content assert_equal 'hello', builder.doc.at('//root/bar').content assert_equal 'hello', builder.doc.at('baz').content end def test_raw_append builder = Nokogiri::XML::Builder.new do |xml| xml.root do xml << 'hello' end end assert_equal 'hello', builder.doc.at('/root').content end def test_raw_append_with_instance_eval builder = Nokogiri::XML::Builder.new do root do self << 'hello' end end assert_equal 'hello', builder.doc.at('/root').content end def test_raw_xml_append builder = Nokogiri::XML::Builder.new do |xml| xml.root do xml << '' end end assert_equal ["aaa"], builder.doc.at_css("root").children.collect(&:name) assert_equal ["bbb","ccc"], builder.doc.at_css("aaa").children.collect(&:name) end def test_raw_xml_append_with_namespaces doc = Nokogiri::XML::Builder.new do |xml| xml.root("xmlns:foo" => "x", "xmlns" => "y") do xml << '' end end.doc el = doc.at 'Element' assert_not_nil el assert_equal 'y', el.namespace.href assert_nil el.namespace.prefix attr = el.attributes["bar"] assert_not_nil attr assert_not_nil attr.namespace assert_equal "foo", attr.namespace.prefix end def test_cdata builder = Nokogiri::XML::Builder.new do root { cdata "hello world" } end assert_equal("", builder.to_xml.gsub(/\n/, "")) end def test_comment builder = Nokogiri::XML::Builder.new do root { comment "this is a comment" } end assert builder.doc.root.children.first.comment? end def test_builder_no_block string = "hello world" builder = Nokogiri::XML::Builder.new builder.root { cdata string } assert_equal("", builder.to_xml.gsub(/\n/, '')) end private def namespaces_defined_on(node) Hash[*node.namespace_definitions.collect{|n| ["xmlns:" + n.prefix, n.href]}.flatten] end end end end nokogiri-1.6.1/test/xml/test_document_fragment.rb0000644000175000017500000001631212261213762021567 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestDocumentFragment < Nokogiri::TestCase def setup super @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) end def test_replace_text_node html = "foo" doc = Nokogiri::XML::DocumentFragment.parse(html) doc.children[0].replace "bar" assert_equal 'bar', doc.children[0].content end def test_fragment_is_relative doc = Nokogiri::XML('
') ctx = doc.root.child fragment = Nokogiri::XML::DocumentFragment.new(doc, '', ctx) hello = fragment.child assert_equal 'hello', hello.name assert_equal doc.root.child.namespace, hello.namespace end def test_node_fragment_is_relative doc = Nokogiri::XML('
') assert doc.root.child fragment = doc.root.child.fragment('') hello = fragment.child assert_equal 'hello', hello.name assert_equal doc.root.child.namespace, hello.namespace end def test_new assert Nokogiri::XML::DocumentFragment.new(@xml) end def test_fragment_should_have_document fragment = Nokogiri::XML::DocumentFragment.new(@xml) assert_equal @xml, fragment.document end def test_name fragment = Nokogiri::XML::DocumentFragment.new(@xml) assert_equal '#document-fragment', fragment.name end def test_static_method fragment = Nokogiri::XML::DocumentFragment.parse("
a
") assert_instance_of Nokogiri::XML::DocumentFragment, fragment end def test_static_method_with_namespaces # follows different path in FragmentHandler#start_element which blew up after 597195ff fragment = Nokogiri::XML::DocumentFragment.parse("a") assert_instance_of Nokogiri::XML::DocumentFragment, fragment end def test_many_fragments 100.times { Nokogiri::XML::DocumentFragment.new(@xml) } end def test_subclass klass = Class.new(Nokogiri::XML::DocumentFragment) fragment = klass.new(@xml, "
a
") assert_instance_of klass, fragment end def test_subclass_parse klass = Class.new(Nokogiri::XML::DocumentFragment) doc = klass.parse("
a
") assert_instance_of klass, doc end def test_xml_fragment fragment = Nokogiri::XML.fragment("
a
") assert_equal "
a
", fragment.to_s end def test_xml_fragment_has_multiple_toplevel_children doc = "
b
e
" fragment = Nokogiri::XML::Document.new.fragment(doc) assert_equal "
b
e
", fragment.to_s end def test_xml_fragment_has_outer_text # this test is descriptive, not prescriptive. doc = "a
b
" fragment = Nokogiri::XML::Document.new.fragment(doc) assert_equal "a
b
", fragment.to_s doc = "
b
c" fragment = Nokogiri::XML::Document.new.fragment(doc) assert_equal "
b
c", fragment.to_s end def test_xml_fragment_case_sensitivity doc = "b" fragment = Nokogiri::XML::Document.new.fragment(doc) assert_equal "b", fragment.to_s end def test_xml_fragment_with_leading_whitespace doc = "
b
" fragment = Nokogiri::XML::Document.new.fragment(doc) assert_equal "
b
", fragment.to_s end def test_xml_fragment_with_leading_whitespace_and_newline doc = " \n
b
" fragment = Nokogiri::XML::Document.new.fragment(doc) assert_equal " \n
b
", fragment.to_s end def test_fragment_children_search fragment = Nokogiri::XML::Document.new.fragment( '

hi

' ) css = fragment.children.css('p') xpath = fragment.children.xpath('.//p') assert_equal css, xpath end def test_fragment_search frag = Nokogiri::XML::Document.new.fragment '

hi

' p_tag = frag.css('#content').first assert_equal 'p', p_tag.name assert_equal p_tag, frag.xpath('./*[@id = \'content\']').first end def test_fragment_without_a_namespace_does_not_get_a_namespace doc = Nokogiri::XML <<-EOX EOX frag = doc.fragment "" assert_nil frag.namespace end def test_fragment_namespace_resolves_against_document_root doc = Nokogiri::XML <<-EOX EOX ns = doc.root.namespace_definitions.detect { |x| x.prefix == "bar" } frag = doc.fragment "" assert frag.children.first.namespace assert_equal ns, frag.children.first.namespace end def test_fragment_invalid_namespace_is_silently_ignored doc = Nokogiri::XML <<-EOX EOX frag = doc.fragment "" assert_nil frag.children.first.namespace end def test_decorator_is_applied x = Module.new do def awesome! end end util_decorate(@xml, x) fragment = Nokogiri::XML::DocumentFragment.new(@xml, "
a
b
") assert node_set = fragment.css('div') assert node_set.respond_to?(:awesome!) node_set.each do |node| assert node.respond_to?(:awesome!), node.class end assert fragment.children.respond_to?(:awesome!), fragment.children.class end def test_add_node_to_doc_fragment_segfault frag = Nokogiri::XML::DocumentFragment.new(@xml, '

hello world

') Nokogiri::XML::Comment.new(frag,'moo') end if Nokogiri.uses_libxml? def test_for_libxml_in_context_fragment_parsing_bug_workaround 10.times do begin fragment = Nokogiri::XML.fragment("
") parent = fragment.children.first child = parent.parse("

").first parent.add_child child end GC.start end end def test_for_libxml_in_context_memory_badness_when_encountering_encoding_errors # see issue #643 for background # this test exists solely to raise an error during valgrind test runs. html = <<-EOHTML
Foo
EOHTML doc = Nokogiri::HTML html doc.at_css("div").replace("Bar") end end end end end nokogiri-1.6.1/test/xml/test_reader_encoding.rb0000644000175000017500000000755412261213762021206 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module XML if RUBY_VERSION =~ /^1\.9/ class TestReaderEncoding < Nokogiri::TestCase def setup super @reader = Nokogiri::XML::Reader( File.read(XML_FILE), XML_FILE, 'UTF-8' ) end def test_attribute_at @reader.each do |node| next unless attribute = node.attribute_at(0) assert_equal @reader.encoding, attribute.encoding.name end end def test_attributes @reader.each do |node| node.attributes.each do |k,v| assert_equal @reader.encoding, k.encoding.name assert_equal @reader.encoding, v.encoding.name end end end def test_attribute xml = <<-eoxml snuggles! eoxml reader = Nokogiri::XML::Reader(xml, nil, 'UTF-8') reader.each do |node| next unless attribute = node.attribute('awesome') assert_equal reader.encoding, attribute.encoding.name end end def test_xml_version @reader.each do |node| next unless version = node.xml_version assert_equal @reader.encoding, version.encoding.name end end def test_lang xml = <<-eoxml

The quick brown fox jumps over the lazy dog.

日本語が上手です

eoxml reader = Nokogiri::XML::Reader(xml, nil, 'UTF-8') reader.each do |node| next unless lang = node.lang assert_equal reader.encoding, lang.encoding.name end end def test_value called = false @reader.each do |node| next unless value = node.value assert_equal @reader.encoding, value.encoding.name called = true end assert called end def test_prefix xml = <<-eoxml hello eoxml reader = Nokogiri::XML::Reader(xml, nil, 'UTF-8') reader.each do |node| next unless prefix = node.prefix assert_equal reader.encoding, prefix.encoding.name end end def test_ns_uri xml = <<-eoxml hello eoxml reader = Nokogiri::XML::Reader(xml, nil, 'UTF-8') reader.each do |node| next unless uri = node.namespace_uri assert_equal reader.encoding, uri.encoding.name end end def test_local_name xml = <<-eoxml hello eoxml reader = Nokogiri::XML::Reader(xml, nil, 'UTF-8') reader.each do |node| next unless lname = node.local_name assert_equal reader.encoding, lname.encoding.name end end def test_name @reader.each do |node| next unless name = node.name assert_equal @reader.encoding, name.encoding.name end end def test_value_lookup_segfault skip("JRuby doesn't do GC.") if Nokogiri.jruby? old_stress = GC.stress begin GC.stress = true while node = @reader.read nodes = node.send(:attr_nodes) nodes.first.name if nodes.first end ensure GC.stress = old_stress end end end end end end nokogiri-1.6.1/test/xml/test_entity_reference.rb0000644000175000017500000002015012261213762021413 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestEntityReference < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(File.open(XML_FILE), XML_FILE) end def test_new assert ref = EntityReference.new(@xml, 'ent4') assert_instance_of EntityReference, ref end def test_many_references 100.times { EntityReference.new(@xml, 'foo') } end def test_newline_node # issue 719 xml = < EOF doc = Nokogiri::XML xml lf_node = Nokogiri::XML::EntityReference.new(doc, "#xa") doc.xpath('/item').first.add_child(lf_node) assert_match / /, doc.to_xml end end module Common PATH = 'test/files/test_document_url/' attr_accessor :path, :parser def xml_document File.join path, 'document.xml' end def self.included base def base.test_relative_and_absolute_path method_name, &block test_relative_path method_name, &block test_absolute_path method_name, &block end def base.test_absolute_path method_name, &block define_method "#{method_name}_with_absolute_path" do self.path = "#{File.expand_path PATH}/" instance_eval(&block) end end def base.test_relative_path method_name, &block define_method method_name do self.path = PATH instance_eval(&block) end end end end class TestDOMEntityReference < Nokogiri::TestCase include Common def setup super @parser = Nokogiri::XML::Document end test_relative_and_absolute_path :test_dom_entity_reference_with_dtdloda do # Make sure that we can parse entity references and include them in the document html = File.read xml_document doc = @parser.parse html, path do |cfg| cfg.default_xml cfg.dtdload cfg.noent end assert_equal [], doc.errors assert_equal "foobar", doc.xpath('//blah').text end test_relative_and_absolute_path :test_dom_entity_reference_with_dtdvalid do # Make sure that we can parse entity references and include them in the document html = File.read xml_document doc = @parser.parse html, path do |cfg| cfg.default_xml cfg.dtdvalid cfg.noent end assert_equal [], doc.errors assert_equal "foobar", doc.xpath('//blah').text end test_absolute_path :test_dom_dtd_loading_with_absolute_path do # Make sure that we can parse entity references and include them in the document html = %Q[ &bar; ] doc = @parser.parse html, xml_document do |cfg| cfg.default_xml cfg.dtdvalid cfg.noent end assert_equal [], doc.errors assert_equal "foobar", doc.xpath('//blah').text end test_relative_and_absolute_path :test_dom_entity_reference_with_io do # Make sure that we can parse entity references and include them in the document html = File.open xml_document doc = @parser.parse html, nil do |cfg| cfg.default_xml cfg.dtdload cfg.noent end assert_equal [], doc.errors assert_equal "foobar", doc.xpath('//blah').text end test_relative_and_absolute_path :test_dom_entity_reference_without_noent do # Make sure that we don't include entity references unless NOENT is set to true html = File.read xml_document doc = @parser.parse html, path do |cfg| cfg.default_xml cfg.dtdload end assert_equal [], doc.errors assert_kind_of Nokogiri::XML::EntityReference, doc.xpath('//body').first.children.first end test_relative_and_absolute_path :test_dom_entity_reference_without_dtdload do # Make sure that we don't include entity references unless NOENT is set to true html = File.read xml_document doc = @parser.parse html, path do |cfg| cfg.default_xml end assert_kind_of Nokogiri::XML::EntityReference, doc.xpath('//body').first.children.first if Nokogiri.uses_libxml? assert_equal ["Entity 'bar' not defined"], doc.errors.map(&:to_s) end end test_relative_and_absolute_path :test_document_dtd_loading_with_nonet do # Make sure that we don't include remote entities unless NOENT is set to true html = %Q[ &bar; ] doc = @parser.parse html, path do |cfg| cfg.default_xml cfg.dtdload end assert_kind_of Nokogiri::XML::EntityReference, doc.xpath('//body').first.children.first if Nokogiri.uses_libxml? assert_equal ["Attempt to load network entity http://foo.bar.com/", "Entity 'bar' not defined"], doc.errors.map(&:to_s) else assert_equal ["Attempt to load network entity http://foo.bar.com/"], doc.errors.map(&:to_s) end end # TODO: can we retreive a resource pointing to localhost when NONET is set to true ? end class TestSaxEntityReference < Nokogiri::SAX::TestCase include Common def setup super @parser = XML::SAX::Parser.new(Doc.new) do |ctx| ctx.replace_entities = true end end test_relative_and_absolute_path :test_sax_entity_reference do # Make sure that we can parse entity references and include them in the document html = File.read xml_document @parser.parse html refute_nil @parser.document.errors assert_equal ["Entity 'bar' not defined"], @parser.document.errors.map(&:to_s).map(&:strip) end test_relative_and_absolute_path :test_more_sax_entity_reference do # Make sure that we don't include entity references unless NOENT is set to true html = %Q[ &bar; ] @parser.parse html refute_nil @parser.document.errors assert_equal ["Entity 'bar' not defined"], @parser.document.errors.map(&:to_s).map(&:strip) end end class TestReaderEntityReference < Nokogiri::TestCase include Common def setup super end test_relative_and_absolute_path :test_reader_entity_reference do # Make sure that we can parse entity references and include them in the document html = File.read xml_document reader = Nokogiri::XML::Reader html, path do |cfg| cfg.default_xml cfg.dtdload cfg.noent end nodes = [] reader.each { |n| nodes << n.value } assert_equal ['foobar'], nodes.compact.map(&:strip).reject(&:empty?) end test_relative_and_absolute_path :test_reader_entity_reference_without_noent do # Make sure that we can parse entity references and include them in the document html = File.read xml_document reader = Nokogiri::XML::Reader html, path do |cfg| cfg.default_xml cfg.dtdload end nodes = [] reader.each { |n| nodes << n.value } assert_equal [], nodes.compact.map(&:strip).reject(&:empty?) end test_relative_and_absolute_path :test_reader_entity_reference_without_dtdload do html = File.read xml_document assert_raises(Nokogiri::XML::SyntaxError) do reader = Nokogiri::XML::Reader html, path do |cfg| cfg.default_xml end reader.each { |n| n } end end end end end nokogiri-1.6.1/test/xml/test_comment.rb0000644000175000017500000000123112261213762017522 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestComment < Nokogiri::TestCase def setup super @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) end def test_new comment = Nokogiri::XML::Comment.new(@xml, 'hello world') assert_equal('', comment.to_s) end def test_comment? comment = Nokogiri::XML::Comment.new(@xml, 'hello world') assert(comment.comment?) assert(!@xml.root.comment?) end def test_many_comments 100.times { Nokogiri::XML::Comment.new(@xml, 'hello world') } end end end end nokogiri-1.6.1/test/xml/test_attr.rb0000644000175000017500000000350212261213762017035 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestAttr < Nokogiri::TestCase def test_new 100.times { doc = Nokogiri::XML::Document.new assert doc assert Nokogiri::XML::Attr.new(doc, 'foo') } end def test_content= xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) address = xml.xpath('//address')[3] street = address.attributes['street'] street.content = "Y&ent1;" assert_equal "Y&ent1;", street.value end def test_value= xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) address = xml.xpath('//address')[3] street = address.attributes['street'] street.value = "Y&ent1;" assert_equal "Y&ent1;", street.value end def test_unlink xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) address = xml.xpath('/staff/employee/address').first assert_equal 'Yes', address['domestic'] address.attribute_nodes.first.unlink assert_nil address['domestic'] end def test_parsing_attribute_namespace doc = Nokogiri::XML <<-EOXML
EOXML node = doc.at_css "div" attr = node.attributes["myattr"] assert_equal "http://flavorjon.es/", attr.namespace.href end def test_setting_attribute_namespace doc = Nokogiri::XML <<-EOXML
EOXML node = doc.at_css "div" attr = node.attributes["myattr"] attr.add_namespace("fizzle", "http://fizzle.com/") assert_equal "http://fizzle.com/", attr.namespace.href end end end end nokogiri-1.6.1/test/xml/test_schema.rb0000644000175000017500000000564412261213762017334 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestSchema < Nokogiri::TestCase def setup assert @xsd = Nokogiri::XML::Schema(File.read(PO_SCHEMA_FILE)) end def test_schema_from_document doc = Nokogiri::XML(File.open(PO_SCHEMA_FILE)) assert doc xsd = Nokogiri::XML::Schema.from_document doc assert_instance_of Nokogiri::XML::Schema, xsd end def test_schema_from_document_node doc = Nokogiri::XML(File.open(PO_SCHEMA_FILE)) assert doc xsd = Nokogiri::XML::Schema.from_document doc.root assert_instance_of Nokogiri::XML::Schema, xsd end def test_schema_validates_with_relative_paths xsd = File.join(ASSETS_DIR, 'foo', 'foo.xsd') xml = File.join(ASSETS_DIR, 'valid_bar.xml') doc = Nokogiri::XML(File.open(xsd)) xsd = Nokogiri::XML::Schema.from_document doc doc = Nokogiri::XML(File.open(xml)) assert xsd.valid?(doc) end def test_parse_with_memory assert_instance_of Nokogiri::XML::Schema, @xsd assert_equal 0, @xsd.errors.length end def test_new assert xsd = Nokogiri::XML::Schema.new(File.read(PO_SCHEMA_FILE)) assert_instance_of Nokogiri::XML::Schema, xsd end def test_parse_with_io xsd = nil File.open(PO_SCHEMA_FILE, 'rb') { |f| assert xsd = Nokogiri::XML::Schema(f) } assert_equal 0, xsd.errors.length end def test_parse_with_errors xml = File.read(PO_SCHEMA_FILE).sub(/name="/, 'name=') assert_raises(Nokogiri::XML::SyntaxError) { Nokogiri::XML::Schema(xml) } end def test_validate_document doc = Nokogiri::XML(File.read(PO_XML_FILE)) assert errors = @xsd.validate(doc) assert_equal 0, errors.length end def test_validate_file assert errors = @xsd.validate(PO_XML_FILE) assert_equal 0, errors.length end def test_validate_invalid_document read_doc = File.read(PO_XML_FILE).gsub(/[^<]*<\/city>/, '') assert errors = @xsd.validate(Nokogiri::XML(read_doc)) assert_equal 2, errors.length end def test_validate_non_document string = File.read(PO_XML_FILE) assert_raise(ArgumentError) {@xsd.validate(string)} end def test_valid? valid_doc = Nokogiri::XML(File.read(PO_XML_FILE)) invalid_doc = Nokogiri::XML( File.read(PO_XML_FILE).gsub(/[^<]*<\/city>/, '') ) assert(@xsd.valid?(valid_doc)) assert(!@xsd.valid?(invalid_doc)) end def test_xsd_with_dtd Dir.chdir(File.join(ASSETS_DIR, 'saml')) do # works Nokogiri::XML::Schema(IO.read('xmldsig_schema.xsd')) # was not working Nokogiri::XML::Schema(IO.read('saml20protocol_schema.xsd')) end end end end end nokogiri-1.6.1/test/xml/test_c14n.rb0000644000175000017500000001061612261213762016634 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestC14N < Nokogiri::TestCase # http://www.w3.org/TR/xml-c14n#Example-OutsideDoc def test_3_1 doc = Nokogiri.XML <<-eoxml Hello, world! eoxml c14n = doc.canonicalize assert_no_match(/version=/, c14n) assert_match(/Hello, world/, c14n) assert_no_match(/Comment/, c14n) c14n = doc.canonicalize(nil, nil, true) assert_match(/Comment/, c14n) end def test_exclude_block_params xml = '
' doc = Nokogiri.XML xml list = [] doc.canonicalize do |node, parent| list << [node, parent] true end if Nokogiri.jruby? assert_equal( ['a', 'document', 'document', nil, 'b', 'a'], list.flatten.map { |x| x ? x.name : x } ) else assert_equal( ['a', 'document', 'document', nil, 'b', 'a', 'a', 'document'], list.flatten.map { |x| x ? x.name : x } ) end end def test_exclude_block_true xml = '' doc = Nokogiri.XML xml c14n = doc.canonicalize do |node, parent| true end assert_equal xml, c14n end def test_exclude_block_false xml = '' doc = Nokogiri.XML xml c14n = doc.canonicalize do |node, parent| false end assert_equal '', c14n end def test_exclude_block_nil xml = '' doc = Nokogiri.XML xml c14n = doc.canonicalize do |node, parent| nil end assert_equal '', c14n end def test_exclude_block_object xml = '' doc = Nokogiri.XML xml c14n = doc.canonicalize do |node, parent| Object.new end assert_equal xml, c14n end def test_c14n_node xml = '' doc = Nokogiri.XML xml c14n = doc.at_xpath('//b').canonicalize assert_equal '', c14n end def test_c14n_modes skip("C14N Exclusive implementation will complete by next version after 1.5.1") if Nokogiri.jruby? # http://www.w3.org/TR/xml-exc-c14n/#sec-Enveloping doc1 = Nokogiri.XML <<-eoxml eoxml doc2 = Nokogiri.XML <<-eoxml eoxml c14n = doc1.at_xpath('//n1:elem2', {'n1' => 'http://example.net'}).canonicalize assert_equal ' ', c14n c14n = doc2.at_xpath('//n1:elem2', {'n1' => 'http://example.net'}).canonicalize assert_equal ' ', c14n excl_c14n = ' ' c14n = doc1.at_xpath('//n1:elem2', {'n1' => 'http://example.net'}).canonicalize(XML::XML_C14N_EXCLUSIVE_1_0) assert_equal excl_c14n, c14n c14n = doc2.at_xpath('//n1:elem2', {'n1' => 'http://example.net'}).canonicalize(XML::XML_C14N_EXCLUSIVE_1_0) assert_equal excl_c14n, c14n c14n = doc2.at_xpath('//n1:elem2', {'n1' => 'http://example.net'}).canonicalize(XML::XML_C14N_EXCLUSIVE_1_0, ['n2']) assert_equal ' ', c14n end end end end nokogiri-1.6.1/test/xml/test_namespace.rb0000644000175000017500000000553612261213762020030 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestNamespace < Nokogiri::TestCase def setup super @xml = Nokogiri::XML <<-eoxml eoxml end if Nokogiri.uses_libxml? def test_namespace_is_in_node_cache node = @xml.root.namespace assert @xml.instance_variable_get(:@node_cache).include?(node) end end def test_built_nodes_keep_namespace_decls doc = Document.new e = Node.new 'element', doc c = Node.new 'child', doc c.default_namespace = 'woop:de:doo' assert c.namespace, 'has a namespace' e.add_child c assert c.namespace, 'has a namespace' doc.add_child e assert c.namespace, 'has a namespace' end def test_inspect ns = @xml.root.namespace assert_equal "#<#{ns.class.name}:#{sprintf("0x%x", ns.object_id)} href=#{ns.href.inspect}>", ns.inspect end def test_namespace_node_prefix namespaces = @xml.root.namespace_definitions assert_equal [nil, 'foo'], namespaces.map { |x| x.prefix } end def test_namespace_node_href namespaces = @xml.root.namespace_definitions assert_equal [ 'http://tenderlovemaking.com/', 'bar' ], namespaces.map { |x| x.href } end def test_equality namespaces = @xml.root.namespace_definitions assert_equal namespaces, @xml.root.namespace_definitions end def test_add_definition @xml.root.add_namespace_definition('baz', 'bar') assert_equal 3, @xml.root.namespace_definitions.length end def test_add_definition_return ns = @xml.root.add_namespace_definition('baz', 'bar') assert_equal 'baz', ns.prefix end def test_remove_entity_namespace s = %q{]>} Nokogiri::XML(s).remove_namespaces! end def test_maintain_element_namespaces doc = Document.new subject = Nokogiri::XML::Node.new 'foo', doc subject << '' child = subject.children.first assert_equal 'foobar', child.name assert_equal 'barfoo', child.namespace.href assert_empty child.attributes end def test_maintain_element_namespaces_in_urn doc = Document.new subject = Nokogiri::XML::Node.new 'foo', doc subject << '' child = subject.children.first assert_equal 'foobar', child.name assert_equal 'urn:xmpp:foospec:barfoo', child.namespace.href assert_empty child.attributes end end end end nokogiri-1.6.1/test/xml/test_node.rb0000644000175000017500000010570412261213762017017 0ustar boutilboutilrequire "helper" require 'stringio' module Nokogiri module XML class TestNode < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE) end def test_first_element_child node = @xml.root.first_element_child assert_equal 'employee', node.name assert node.element?, 'node is an element' end def test_element_children nodes = @xml.root.element_children assert_equal @xml.root.first_element_child, nodes.first assert nodes.all? { |node| node.element? }, 'all nodes are elements' end def test_last_element_child nodes = @xml.root.element_children assert_equal nodes.last, @xml.root.element_children.last end def test_bad_xpath bad_xpath = '//foo[' begin @xml.xpath(bad_xpath) rescue Nokogiri::XML::XPath::SyntaxError => e assert_match(bad_xpath, e.to_s) end end def test_namespace_type_error assert_raises(TypeError) do @xml.root.namespace = Object.new end end def test_remove_namespace @xml = Nokogiri::XML('') tag = @xml.at('s') assert tag.namespace tag.namespace = nil assert_nil tag.namespace end def test_parse_needs_doc list = @xml.root.parse('fooooooo ') assert_equal 1, list.css('hello').length end def test_parse list = @xml.root.parse('fooooooo ') assert_equal 2, list.length end def test_parse_with_block called = false list = @xml.root.parse('') { |cfg| called = true assert_instance_of Nokogiri::XML::ParseOptions, cfg } assert called, 'config block called' assert_equal 1, list.length end def test_parse_with_io list = @xml.root.parse(StringIO.new('')) assert_equal 1, list.length assert_equal 'hello', list.first.name end def test_parse_with_empty_string list = @xml.root.parse('') assert_equal 0, list.length end def test_parse_config_option node = @xml.root options = nil node.parse("") do |config| options = config end assert_equal Nokogiri::XML::ParseOptions::DEFAULT_XML, options.to_i end # descriptive, not prescriptive. def test_parse_invalid_html_markup_results_in_empty_nodeset doc = Nokogiri::HTML("") nodeset = doc.root.parse "
a
b
" assert_equal 1, doc.errors.length # "Tag snippet invalid" assert_equal 1, nodeset.length end def test_node_context_parsing_of_malformed_html_fragment_with_recover_is_corrected doc = HTML.parse "
" context_node = doc.at_css "div" nodeset = context_node.parse("
") do |options| options.recover end assert_equal "
", nodeset.to_s assert_equal 1, doc.errors.length assert_equal 1, nodeset.length end def test_node_context_parsing_of_malformed_html_fragment_without_recover_is_not_corrected doc = HTML.parse "
" context_node = doc.at_css "div" nodeset = context_node.parse("
") do |options| options.strict end assert_equal 1, doc.errors.length assert_equal 0, nodeset.length end def test_parse_error_list error_count = @xml.errors.length @xml.root.parse('') assert(error_count < @xml.errors.length, "errors should have increased") end def test_parse_error_on_fragment_with_empty_document doc = Document.new fragment = DocumentFragment.new(doc, '') node = fragment%'bar' node.parse('<') end def test_subclass_dup subclass = Class.new(Nokogiri::XML::Node) node = subclass.new('foo', @xml).dup assert_instance_of subclass, node end def test_gt_string_arg node = @xml.at('employee') nodes = (node > 'name') assert_equal 1, nodes.length assert_equal node, nodes.first.parent end def test_next_element_when_next_sibling_is_element_should_return_next_sibling doc = Nokogiri::XML "" node = doc.at_css("foo") next_element = node.next_element assert next_element.element? assert_equal doc.at_css("quux"), next_element end def test_next_element_when_there_is_no_next_sibling_should_return_nil doc = Nokogiri::XML "" assert_nil doc.at_css("quux").next_element end def test_next_element_when_next_sibling_is_not_an_element_should_return_closest_next_element_sibling doc = Nokogiri::XML "bar" node = doc.at_css("foo") next_element = node.next_element assert next_element.element? assert_equal doc.at_css("quux"), next_element end def test_next_element_when_next_sibling_is_not_an_element_and_no_following_element_should_return_nil doc = Nokogiri::XML "bar" node = doc.at_css("foo") next_element = node.next_element assert_nil next_element end def test_previous_element_when_previous_sibling_is_element_should_return_previous_sibling doc = Nokogiri::XML "" node = doc.at_css("quux") previous_element = node.previous_element assert previous_element.element? assert_equal doc.at_css("foo"), previous_element end def test_previous_element_when_there_is_no_previous_sibling_should_return_nil doc = Nokogiri::XML "" assert_nil doc.at_css("foo").previous_element end def test_previous_element_when_previous_sibling_is_not_an_element_should_return_closest_previous_element_sibling doc = Nokogiri::XML "bar" node = doc.at_css("quux") previous_element = node.previous_element assert previous_element.element? assert_equal doc.at_css("foo"), previous_element end def test_previous_element_when_previous_sibling_is_not_an_element_and_no_following_element_should_return_nil doc = Nokogiri::XML "foo" node = doc.at_css("bar") previous_element = node.previous_element assert_nil previous_element end def test_element? assert @xml.root.element?, 'is an element' end def test_slash_search assert_equal 'EMP0001', (@xml/:staff/:employee/:employeeId).first.text end def test_append_with_document assert_raises(ArgumentError) do @xml.root << Nokogiri::XML::Document.new end end def test_append_with_attr r = Nokogiri.XML('').root assert_raises(ArgumentError) do r << r.at_xpath('@a') end end def test_inspect_ns xml = Nokogiri::XML(<<-eoxml) { |c| c.noblanks } eoxml ins = xml.inspect xml.traverse do |node| assert_match node.class.name, ins if node.respond_to? :attributes node.attributes.each do |k,v| assert_match k, ins assert_match v, ins end end if node.respond_to?(:namespace) && node.namespace assert_match node.namespace.class.name, ins assert_match node.namespace.href, ins end end end def test_namespace_definitions_when_some_exist xml = Nokogiri::XML <<-eoxml eoxml namespace_definitions = xml.root.namespace_definitions assert_equal 2, namespace_definitions.length end def test_namespace_definitions_when_no_exist xml = Nokogiri::XML <<-eoxml eoxml namespace_definitions = xml.at_xpath('//xmlns:awesome').namespace_definitions assert_equal 0, namespace_definitions.length end def test_namespace_goes_to_children fruits = Nokogiri::XML(<<-eoxml) eoxml apple = Nokogiri::XML::Node.new('Apple', fruits) orange = Nokogiri::XML::Node.new('Orange', fruits) apple << orange fruits.root << apple assert fruits.at('//fruit:Orange',{'fruit'=>'www.fruits.org'}) assert fruits.at('//fruit:Apple',{'fruit'=>'www.fruits.org'}) end def test_description assert_nil @xml.at('employee').description end def test_spaceship nodes = @xml.xpath('//employee') assert_equal(-1, (nodes.first <=> nodes.last)) list = [nodes.first, nodes.last].sort assert_equal nodes.first, list.first assert_equal nodes.last, list.last end def test_incorrect_spaceship nodes = @xml.xpath('//employee') assert_nil(nodes.first <=> 'asdf') end def test_document_compare nodes = @xml.xpath('//employee') assert_equal(-1, (nodes.first <=> @xml)) end def test_different_document_compare nodes = @xml.xpath('//employee') doc = Nokogiri::XML('') b = doc.at('b') assert_nil(nodes.first <=> b) end def test_duplicate_node_removes_namespace fruits = Nokogiri::XML(<<-eoxml) eoxml apple = fruits.root.xpath('fruit:Apple', {'fruit'=>'www.fruits.org'})[0] new_apple = apple.dup fruits.root << new_apple assert_equal 2, fruits.xpath('//xmlns:Apple').length assert_equal 1, fruits.to_xml.scan('www.fruits.org').length end [:clone, :dup].each do |symbol| define_method "test_#{symbol}" do node = @xml.at('//employee') other = node.send(symbol) assert_equal "employee", other.name assert_nil other.parent end end def test_fragment_creates_elements apple = @xml.fragment('') apple.children.each do |child| assert_equal Nokogiri::XML::Node::ELEMENT_NODE, child.type assert_instance_of Nokogiri::XML::Element, child end end def test_node_added_to_root_should_get_namespace fruits = Nokogiri::XML(<<-eoxml) eoxml apple = fruits.fragment('') fruits.root << apple assert_equal 1, fruits.xpath('//xmlns:Apple').length end def test_new_node_can_have_ancestors xml = Nokogiri::XML('text') item = Nokogiri::XML::Element.new('item', xml) assert_equal 0, item.ancestors.length end def test_children doc = Nokogiri::XML(<<-eoxml) #{'' * 9 } eoxml assert_equal 9, doc.root.children.length assert_equal 9, doc.root.children.to_a.length doc = Nokogiri::XML(<<-eoxml) #{'' * 15 } eoxml assert_equal 15, doc.root.children.length assert_equal 15, doc.root.children.to_a.length end def test_add_namespace node = @xml.at('address') node.add_namespace('foo', 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', node.namespaces['xmlns:foo'] end def test_add_namespace_twice node = @xml.at('address') ns = node.add_namespace('foo', 'http://tenderlovemaking.com') ns2 = node.add_namespace('foo', 'http://tenderlovemaking.com') assert_equal ns, ns2 end def test_add_default_ns node = @xml.at('address') node.add_namespace(nil, 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', node.namespaces['xmlns'] end def test_add_multiple_namespaces node = @xml.at('address') node.add_namespace(nil, 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', node.namespaces['xmlns'] node.add_namespace('foo', 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', node.namespaces['xmlns:foo'] end def test_default_namespace= node = @xml.at('address') node.default_namespace = 'http://tenderlovemaking.com' assert_equal 'http://tenderlovemaking.com', node.namespaces['xmlns'] end def test_namespace= node = @xml.at('address') assert_nil node.namespace definition = node.add_namespace_definition 'bar', 'http://tlm.com/' node.namespace = definition assert_equal definition, node.namespace assert_equal node, @xml.at('//foo:address', { 'foo' => 'http://tlm.com/' }) end def test_add_namespace_with_nil_associates_node node = @xml.at('address') assert_nil node.namespace definition = node.add_namespace_definition nil, 'http://tlm.com/' assert_equal definition, node.namespace end def test_add_namespace_does_not_associate_node node = @xml.at('address') assert_nil node.namespace assert node.add_namespace_definition 'foo', 'http://tlm.com/' assert_nil node.namespace end def test_set_namespace_from_different_doc node = @xml.at('address') doc = Nokogiri::XML(File.read(XML_FILE), XML_FILE) decl = doc.root.add_namespace_definition 'foo', 'bar' assert_raises(ArgumentError) do node.namespace = decl end end def test_set_namespace_must_only_take_a_namespace node = @xml.at('address') assert_raises(TypeError) do node.namespace = node end end def test_at node = @xml.at('address') assert_equal node, @xml.xpath('//address').first end def test_at_self node = @xml.at('address') assert_equal node, node.at('.') end def test_at_xpath node = @xml.at_xpath('//address') nodes = @xml.xpath('//address') assert_equal 5, nodes.size assert_equal node, nodes.first end def test_at_css node = @xml.at_css('address') nodes = @xml.css('address') assert_equal 5, nodes.size assert_equal node, nodes.first end def test_percent node = @xml % ('address') assert_equal node, @xml.xpath('//address').first end def test_accept visitor = Class.new { attr_accessor :visited def accept target target.accept(self) end def visit node node.children.each { |c| c.accept(self) } @visited = true end }.new visitor.accept(@xml.root) assert visitor.visited end def test_write_to io = StringIO.new @xml.write_to io io.rewind assert_equal @xml.to_xml, io.read end def test_attribute_with_symbol assert_equal 'Yes', @xml.css('address').first[:domestic] end def test_non_existent_attribute_should_return_nil node = @xml.root.first_element_child assert_nil node.attribute('type') end def test_write_to_with_block called = false io = StringIO.new conf = nil @xml.write_to io do |config| called = true conf = config config.format.as_html.no_empty_tags end io.rewind assert called string = io.read assert_equal @xml.serialize(nil, conf.options), string assert_equal @xml.serialize(nil, conf), string end %w{ xml html xhtml }.each do |type| define_method(:"test_write_#{type}_to") do io = StringIO.new assert @xml.send(:"write_#{type}_to", io) io.rewind assert_match @xml.send(:"to_#{type}"), io.read end end def test_serialize_with_block called = false conf = nil string = @xml.serialize do |config| called = true conf = config config.format.as_html.no_empty_tags end assert called assert_equal @xml.serialize(nil, conf.options), string assert_equal @xml.serialize(nil, conf), string end def test_hold_refence_to_subnode doc = Nokogiri::XML(<<-eoxml) eoxml assert node_a = doc.css('a').first assert node_b = node_a.css('b').first node_a.unlink assert_equal 'b', node_b.name end def test_values assert_equal %w{ Yes Yes }, @xml.xpath('//address')[1].values end def test_keys assert_equal %w{ domestic street }, @xml.xpath('//address')[1].keys end def test_each attributes = [] @xml.xpath('//address')[1].each do |key, value| attributes << [key, value] end assert_equal [['domestic', 'Yes'], ['street', 'Yes']], attributes end def test_new assert node = Nokogiri::XML::Node.new('input', @xml) assert_equal 1, node.node_type assert_instance_of Nokogiri::XML::Element, node end def test_to_str name = @xml.xpath('//name').first assert_match(/Margaret/, '' + name) assert_equal('Margaret Martin', '' + name.children.first) end def test_ancestors address = @xml.xpath('//address').first assert_equal 3, address.ancestors.length assert_equal ['employee', 'staff', 'document'], address.ancestors.map { |x| x.name } end def test_read_only? assert entity_decl = @xml.internal_subset.children.find { |x| x.type == Node::ENTITY_DECL } assert entity_decl.read_only? end def test_remove_attribute address = @xml.xpath('/staff/employee/address').first assert_equal 'Yes', address['domestic'] address.remove_attribute 'domestic' assert_nil address['domestic'] end def test_attribute_setter_accepts_non_string address = @xml.xpath("/staff/employee/address").first assert_equal "Yes", address[:domestic] address[:domestic] = "Altered Yes" assert_equal "Altered Yes", address[:domestic] end def test_attribute_accessor_accepts_non_string address = @xml.xpath("/staff/employee/address").first assert_equal "Yes", address["domestic"] assert_equal "Yes", address[:domestic] end def test_empty_attribute_reading node = Nokogiri::XML '' assert_equal '', node.root['empty'] assert_equal ' ', node.root['whitespace'] end def test_delete address = @xml.xpath('/staff/employee/address').first assert_equal 'Yes', address['domestic'] address.delete 'domestic' assert_nil address['domestic'] end def test_set_content_with_symbol node = @xml.at('//name') node.content = :foo assert_equal 'foo', node.content end def test_set_native_content_is_unescaped comment = Nokogiri.XML('').at('//comment()') comment.native_content = " < " # content= will escape this string assert_equal "", comment.to_xml end def test_find_by_css_class_with_nonstandard_whitespace doc = Nokogiri::HTML '
' assert_not_nil doc.at_css(".b") end def test_find_by_css_with_tilde_eql xml = Nokogiri::XML.parse(<<-eoxml) Hello world Bar Bar Bar Bar Awesome Awesome eoxml set = xml.css('a[@class~="bar"]') assert_equal 4, set.length assert_equal ['Bar'], set.map { |node| node.content }.uniq end def test_unlink xml = Nokogiri::XML.parse(<<-eoxml) Bar Bar Bar Hello world Bar Awesome Awesome eoxml node = xml.xpath('//a')[3] assert_equal('Hello world', node.text) assert_match(/Hello world/, xml.to_s) assert node.parent assert node.document assert node.previous_sibling assert node.next_sibling node.unlink assert !node.parent #assert !node.document assert !node.previous_sibling assert !node.next_sibling assert_no_match(/Hello world/, xml.to_s) end def test_next_sibling assert node = @xml.root assert sibling = node.child.next_sibling assert_equal('employee', sibling.name) end def test_previous_sibling assert node = @xml.root assert sibling = node.child.next_sibling assert_equal('employee', sibling.name) assert_equal(sibling.previous_sibling, node.child) end def test_name= assert node = @xml.root node.name = 'awesome' assert_equal('awesome', node.name) end def test_child assert node = @xml.root assert child = node.child assert_equal('text', child.name) end def test_key? assert node = @xml.search('//address').first assert(!node.key?('asdfasdf')) end def test_set_property assert node = @xml.search('//address').first node['foo'] = 'bar' assert_equal('bar', node['foo']) end def test_set_property_non_string assert node = @xml.search('//address').first node['foo'] = 1 assert_equal('1', node['foo']) node['foo'] = false assert_equal('false', node['foo']) end def test_attributes assert node = @xml.search('//address').first assert_nil(node['asdfasdfasdf']) assert_equal('Yes', node['domestic']) assert node = @xml.search('//address')[2] attr = node.attributes assert_equal 2, attr.size assert_equal 'Yes', attr['domestic'].value assert_equal 'Yes', attr['domestic'].to_s assert_equal 'No', attr['street'].value end def test_path assert set = @xml.search('//employee') assert node = set.first assert_equal('/staff/employee[1]', node.path) end def test_parent_xpath assert set = @xml.search('//employee') assert node = set.first assert parent_set = node.search('..') assert parent_node = parent_set.first assert_equal '/staff', parent_node.path assert_equal node.parent, parent_node end def test_search_self node = @xml.at('//employee') assert_equal node.search('.').to_a, [node] end def test_search_by_symbol assert set = @xml.search(:employee) assert 5, set.length assert node = @xml.at(:employee) assert node.text =~ /EMP0001/ end def test_new_node node = Nokogiri::XML::Node.new('form', @xml) assert_equal('form', node.name) assert(node.document) end def test_encode_special_chars foo = @xml.css('employee').first.encode_special_chars('&') assert_equal '&', foo end def test_content_equals node = Nokogiri::XML::Node.new('form', @xml) assert_equal('', node.content) node.content = 'hello world!' assert_equal('hello world!', node.content) node.content = '& &' assert_equal('& &', node.content) assert_equal('
& <foo> &amp;
', node.to_xml) node.content = "1234 <-> 1234" assert_equal "1234 <-> 1234", node.content assert_equal "
1234 <-> 1234
", node.to_xml node.content = '1234' node.add_child '5678' assert_equal '12345678', node.content end # issue #839 def test_encoding_of_copied_nodes d1 = Nokogiri::XML('&') d2 = Nokogiri::XML('') ne = d1.root.xpath('//a').first.dup(1) ne.content += "& < & > \" &" d2.root << ne assert_match /&& < & > " &<\/a>/, d2.to_s end def test_content_after_appending_text doc = Nokogiri::XML '' node = doc.children.first node.content = 'bar' node << 'baz' assert_equal 'barbaz', node.content end def test_content_depth_first node = Nokogiri::XML 'firstsecondthird' assert_equal 'firstsecondthird', node.content end def test_set_content_should_unlink_existing_content node = @xml.at_css("employee") children = node.children node.content = "hello" children.each { |child| assert_nil child.parent } end def test_whitespace_nodes doc = Nokogiri::XML.parse("Foo\nBar

Bazz

") children = doc.at('//root').children.collect{|j| j.to_s} assert_equal "\n", children[1] assert_equal " ", children[3] end def test_node_equality doc1 = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) doc2 = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) address1_1 = doc1.xpath('//address').first address1_2 = doc1.xpath('//address').first address2 = doc2.xpath('//address').first assert_not_equal address1_1, address2 # two references to very, very similar nodes assert_equal address1_1, address1_2 # two references to the exact same node end def test_namespace_search_with_xpath_and_hash xml = Nokogiri::XML.parse(<<-eoxml) Michelin Model XGV I'm a bicycle tire! eoxml tires = xml.xpath('//bike:tire', {'bike' => 'http://schwinn.com/'}) assert_equal 1, tires.length end def test_namespace_search_with_xpath_and_hash_with_symbol_keys xml = Nokogiri::XML.parse(<<-eoxml) Michelin Model XGV I'm a bicycle tire! eoxml tires = xml.xpath('//bike:tire', :bike => 'http://schwinn.com/') assert_equal 1, tires.length end def test_namespace_search_with_css xml = Nokogiri::XML.parse(<<-eoxml) Michelin Model XGV I'm a bicycle tire! eoxml tires = xml.css('bike|tire', 'bike' => 'http://schwinn.com/' ) assert_equal 1, tires.length end def test_namespaced_attribute_search_with_xpath # from #593 xmlContent = <<-EOXML with namespace no namespace EOXML xml_doc = Nokogiri::XML(xmlContent) no_ns = xml_doc.xpath("//*[@att]") assert_equal no_ns.length, 1 assert_equal no_ns.first.content, "no namespace" with_ns = xml_doc.xpath("//*[@ns1:att]") assert_equal with_ns.length, 1 assert_equal with_ns.first.content, "with namespace" end def test_namespaced_attribute_search_with_css # from #593 xmlContent = <<-EOXML with namespace no namespace EOXML xml_doc = Nokogiri::XML(xmlContent) no_ns = xml_doc.css('*[att]') assert_equal no_ns.length, 1 assert_equal no_ns.first.content, "no namespace" with_ns = xml_doc.css('*[ns1|att]') assert_equal with_ns.length, 1 assert_equal with_ns.first.content, "with namespace" end def test_namespaces_should_include_all_namespace_definitions xml = Nokogiri::XML.parse(<<-EOF) hello
EOF namespaces = xml.namespaces # Document#namespace assert_equal({"xmlns" => "http://quux.com/", "xmlns:a" => "http://foo.com/", "xmlns:b" => "http://bar.com/"}, namespaces) namespaces = xml.root.namespaces assert_equal({"xmlns" => "http://quux.com/", "xmlns:a" => "http://foo.com/", "xmlns:b" => "http://bar.com/"}, namespaces) namespaces = xml.at_xpath("//xmlns:y").namespaces assert_equal({"xmlns" => "http://quux.com/", "xmlns:a" => "http://foo.com/", "xmlns:b" => "http://bar.com/", "xmlns:c" => "http://bazz.com/"}, namespaces) namespaces = xml.at_xpath("//xmlns:z").namespaces assert_equal({"xmlns" => "http://quux.com/", "xmlns:a" => "http://foo.com/", "xmlns:b" => "http://bar.com/", "xmlns:c" => "http://bazz.com/"}, namespaces) namespaces = xml.at_xpath("//xmlns:a").namespaces assert_equal({"xmlns" => "http://quux.com/", "xmlns:a" => "http://foo.com/", "xmlns:b" => "http://bar.com/", "xmlns:c" => "http://newc.com/"}, namespaces) end def test_namespace xml = Nokogiri::XML.parse(<<-EOF) hello a hello b hello c
hello moon
EOF set = xml.search("//y/*") assert_equal "a", set[0].namespace.prefix assert_equal "b", set[1].namespace.prefix assert_equal "c", set[2].namespace.prefix assert_equal nil, set[3].namespace end if Nokogiri.uses_libxml? def test_namespace_without_an_href_on_html_node # because microsoft word's HTML formatting does this. ick. xml = Nokogiri::HTML.parse <<-EOF
foo
EOF assert_not_nil(node = xml.at('p')) assert_equal 1, node.namespaces.keys.size assert node.namespaces.has_key?('xmlns:o') assert_equal nil, node.namespaces['xmlns:o'] end end def test_line xml = Nokogiri::XML(<<-eoxml)
Hello world eoxml set = xml.search("//a") node = set.first assert_equal 2, node.line end def test_xpath_results_have_document_and_are_decorated x = Module.new do def awesome! ; end end util_decorate(@xml, x) node_set = @xml.xpath("//employee") assert_equal @xml, node_set.document assert node_set.respond_to?(:awesome!) end def test_css_results_have_document_and_are_decorated x = Module.new do def awesome! ; end end util_decorate(@xml, x) node_set = @xml.css("employee") assert_equal @xml, node_set.document assert node_set.respond_to?(:awesome!) end def test_blank doc = Nokogiri::XML('') assert_equal false, doc.blank? end def test_to_xml_allows_to_serialize_with_as_xml_save_option xml = Nokogiri::XML("
  • Hello world
") set = xml.search("//ul") node = set.first assert_no_match("
    \n
  • ", xml.to_xml(:save_with => XML::Node::SaveOptions::AS_XML)) assert_no_match("
      \n
    • ", node.to_xml(:save_with => XML::Node::SaveOptions::AS_XML)) end # issue 647 def test_default_namespace_should_be_created subject = Nokogiri::XML.parse('').root ns = subject.attributes['bar'].namespace assert_not_nil ns assert_equal ns.class, Nokogiri::XML::Namespace assert_equal 'xml', ns.prefix assert_equal "http://www.w3.org/XML/1998/namespace", ns.href end # issue 648 def test_namespace_without_prefix_should_be_set node = Nokogiri::XML.parse('').root subject = Nokogiri::XML::Node.new 'foo', node.document subject.namespace = node.namespace ns = subject.namespace assert_equal ns.class, Nokogiri::XML::Namespace assert_nil ns.prefix assert_equal ns.href, "http://bar.com" end # issue 695 def test_namespace_in_rendered_xml document = Nokogiri::XML::Document.new subject = Nokogiri::XML::Node.new 'foo', document ns = subject.add_namespace nil, 'bar' subject.namespace = ns assert_match(/xmlns="bar"/, subject.to_xml) end # issue 771 def test_format_noblank content = < hello eoxml subject = Nokogiri::XML(content) do |conf| conf.default_xml.noblanks end assert_match %r{hello}, subject.to_xml(:indent => 2) end def test_text_node_colon document = Nokogiri::XML::Document.new root = Nokogiri::XML::Node.new 'foo', document document.root = root root << "hello:with_colon" assert_match(/hello:with_colon/, document.to_xml) end end end end nokogiri-1.6.1/test/xml/test_entity_decl.rb0000644000175000017500000000612212261213762020367 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestEntityDecl < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(<<-eoxml) ]> eoxml @entities = @xml.internal_subset.children @entity_decl = @entities.first end def test_constants assert_equal 1, EntityDecl::INTERNAL_GENERAL assert_equal 2, EntityDecl::EXTERNAL_GENERAL_PARSED assert_equal 3, EntityDecl::EXTERNAL_GENERAL_UNPARSED assert_equal 4, EntityDecl::INTERNAL_PARAMETER assert_equal 5, EntityDecl::EXTERNAL_PARAMETER assert_equal 6, EntityDecl::INTERNAL_PREDEFINED end def test_create_typed_entity entity = @xml.create_entity( 'foo', EntityDecl::INTERNAL_GENERAL, nil, nil, nil ) assert_equal EntityDecl::INTERNAL_GENERAL, entity.entity_type assert_equal 'foo', entity.name end def test_new entity = Nokogiri::XML::EntityDecl.new( 'foo', @xml, EntityDecl::INTERNAL_GENERAL, nil, nil, nil ) assert_equal EntityDecl::INTERNAL_GENERAL, entity.entity_type assert_equal 'foo', entity.name end def test_create_default_args entity = @xml.create_entity('foo') assert_equal EntityDecl::INTERNAL_GENERAL, entity.entity_type assert_equal 'foo', entity.name end def test_external_id assert_nil @entity_decl.external_id end def test_system_id assert_nil @entity_decl.system_id end def test_entity_type assert_equal 1, @entity_decl.entity_type end def test_original_content assert_equal "es", @entity_decl.original_content if Nokogiri.jruby? assert_nil @entities[1].original_content else assert_equal "", @entities[1].original_content end end def test_content assert_equal "es", @entity_decl.content if Nokogiri.jruby? assert_nil @entities[1].content else assert_equal "", @entities[1].content end end def test_type assert_equal 17, @entities.first.type end def test_class assert_instance_of Nokogiri::XML::EntityDecl, @entities.first end def test_attributes assert_raise NoMethodError do @entity_decl.attributes end end def test_namespace assert_raise NoMethodError do @entity_decl.namespace end end def test_namespace_definitions assert_raise NoMethodError do @entity_decl.namespace_definitions end end def test_line assert_raise NoMethodError do @entity_decl.line end end def test_inspect assert_equal( "#<#{@entity_decl.class.name}:#{sprintf("0x%x", @entity_decl.object_id)} #{@entity_decl.to_s.inspect}>", @entity_decl.inspect ) end end end end nokogiri-1.6.1/test/xml/test_element_content.rb0000644000175000017500000000243512261213762021252 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestElementContent < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(<<-eoxml) ]> eoxml @elements = @xml.internal_subset.children.find_all { |x| x.type == 15 } @tree = @elements[1].content end def test_allowed_content_not_defined assert_nil @elements.first.content end def test_document assert @tree assert_equal @xml, @tree.document end def test_type assert_equal ElementContent::SEQ, @tree.type end def test_children assert_equal 2, @tree.children.length end def test_name assert_nil @tree.name assert_equal 'head', @tree.children.first.name assert_equal 'p', @tree.children[1].children.first.children.first.name end def test_occur assert_equal ElementContent::ONCE, @tree.occur end def test_prefix assert_nil @tree.prefix assert_equal 'tender', @elements[2].content.prefix end end end end nokogiri-1.6.1/test/xml/test_element_decl.rb0000644000175000017500000000325012261213762020503 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestElementDecl < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(<<-eoxml) ]> eoxml @elements = @xml.internal_subset.children.find_all { |x| x.type == 15 } end def test_inspect e = @elements.first assert_equal( "#<#{e.class.name}:#{sprintf("0x%x", e.object_id)} #{e.to_s.inspect}>", e.inspect ) end def test_prefix assert_nil @elements[1].prefix assert_equal 'my', @elements[2].prefix end def test_line assert_raise NoMethodError do @elements.first.line end end def test_namespace assert_raise NoMethodError do @elements.first.namespace end end def test_namespace_definitions assert_raise NoMethodError do @elements.first.namespace_definitions end end def test_element_type assert_equal 1, @elements.first.element_type end def test_type assert_equal 15, @elements.first.type end def test_class assert_instance_of Nokogiri::XML::ElementDecl, @elements.first end def test_attributes assert_equal 2, @elements.first.attribute_nodes.length assert_equal 'width', @elements.first.attribute_nodes.first.name end end end end nokogiri-1.6.1/test/xml/test_dtd.rb0000644000175000017500000000513112261213762016636 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestDTD < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(File.open(XML_FILE)) assert @dtd = @xml.internal_subset end def test_system_id assert_equal 'staff.dtd', @dtd.system_id end def test_external_id xml = Nokogiri::XML('') assert dtd = xml.internal_subset, 'no internal subset' assert_equal 'bar', dtd.external_id end def test_content assert_raise NoMethodError do @dtd.content end end def test_empty_attributes dtd = Nokogiri::HTML("").internal_subset assert_equal Hash.new, dtd.attributes end def test_attributes assert_equal ['width'], @dtd.attributes.keys assert_equal '0', @dtd.attributes['width'].default end def test_keys assert_equal ['width'], @dtd.keys end def test_each hash = {} @dtd.each { |key, value| hash[key] = value } assert_equal @dtd.attributes, hash end def test_namespace assert_raise NoMethodError do @dtd.namespace end end def test_namespace_definitions assert_raise NoMethodError do @dtd.namespace_definitions end end def test_line assert_raise NoMethodError do @dtd.line end end def test_validate if Nokogiri.uses_libxml? list = @xml.internal_subset.validate @xml assert_equal 44, list.length else xml = Nokogiri::XML(File.open(XML_FILE)) {|cfg| cfg.dtdvalid} list = xml.internal_subset.validate xml assert_equal 40, list.length end end def test_external_subsets assert subset = @xml.internal_subset assert_equal 'staff', subset.name end def test_entities assert entities = @dtd.entities assert_equal %w[ ent1 ent2 ent3 ent4 ent5 ].sort, entities.keys.sort end def test_elements assert elements = @dtd.elements assert_equal %w[ br ], elements.keys assert_equal 'br', elements['br'].name end def test_notations assert notations = @dtd.notations assert_equal %w[ notation1 notation2 ].sort, notations.keys.sort assert notation1 = notations['notation1'] assert_equal 'notation1', notation1.name assert_equal 'notation1File', notation1.public_id assert_nil notation1.system_id end end end end nokogiri-1.6.1/test/xml/test_xinclude.rb0000644000175000017500000000516312261213762017703 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestXInclude < Nokogiri::TestCase def setup super @xml = Nokogiri::XML.parse(File.read(XML_XINCLUDE_FILE), XML_XINCLUDE_FILE) @included = "this snippet is to be included from xinclude.xml" end def test_xinclude_on_document_parse skip("Pure Java version XInlcude has a conflict with NekoDTD setting. This will be fixed later.") if Nokogiri.jruby? # first test that xinclude works when requested xml_doc = nil File.open(XML_XINCLUDE_FILE) do |fp| xml_doc = Nokogiri::XML(fp) do |conf| conf.strict.dtdload.noent.nocdata.xinclude end end assert_not_nil xml_doc assert_not_nil included = xml_doc.at_xpath('//included') assert_equal @included, included.content # no xinclude should happen when not requested xml_doc = nil File.open(XML_XINCLUDE_FILE) do |fp| xml_doc = Nokogiri::XML(fp) do |conf| conf.strict.dtdload.noent.nocdata end end assert_not_nil xml_doc assert_nil xml_doc.at_xpath('//included') end def test_xinclude_on_document_node skip("Pure Java version turns XInlcude on against a parser.") if Nokogiri.jruby? assert_nil @xml.at_xpath('//included') @xml.do_xinclude assert_not_nil included = @xml.at_xpath('//included') assert_equal @included, included.content end def test_xinclude_on_element_subtree skip("Pure Java version turns XInlcude on against a parser.") if Nokogiri.jruby? assert_nil @xml.at_xpath('//included') @xml.root.do_xinclude assert_not_nil included = @xml.at_xpath('//included') assert_equal @included, included.content end def test_do_xinclude_accepts_block non_default_options = Nokogiri::XML::ParseOptions::NOBLANKS | \ Nokogiri::XML::ParseOptions::XINCLUDE @xml.do_xinclude(non_default_options) do |options| assert_equal non_default_options, options.to_i end end def test_include_nonexistent_throws_exception skip("Pure Java version behaves differently") if Nokogiri.jruby? # break inclusion deliberately @xml.at_xpath('//xi:include')['href'] = "nonexistent.xml" exception_raised = false begin @xml.do_xinclude { |opts| opts.nowarning } rescue Exception => e assert_equal Nokogiri::XML::SyntaxError, e.class exception_raised = true end assert exception_raised end end end end nokogiri-1.6.1/test/xml/test_document_encoding.rb0000644000175000017500000000120412261213762021544 0ustar boutilboutilrequire "helper" module Nokogiri module XML if RUBY_VERSION =~ /^1\.9/ class TestDocumentEncoding < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE, 'UTF-8') end def test_url assert_equal @xml.encoding, @xml.url.encoding.name end def test_encoding assert_equal @xml.encoding, @xml.encoding.encoding.name end def test_dotted_version if Nokogiri.uses_libxml? assert_equal 'UTF-8', Nokogiri::LIBXML_VERSION.encoding.name end end end end end end nokogiri-1.6.1/test/xml/sax/0000755000175000017500000000000012261213762015272 5ustar boutilboutilnokogiri-1.6.1/test/xml/sax/test_push_parser.rb0000644000175000017500000001077612261213762021224 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module XML module SAX class TestPushParser < Nokogiri::SAX::TestCase def setup super @parser = XML::SAX::PushParser.new(Doc.new) end def test_exception assert_raises(SyntaxError) do @parser << "" end assert_raises(SyntaxError) do @parser << nil end end def test_end_document_called @parser.<<(<<-eoxml)

      Paragraph 1

      eoxml assert ! @parser.document.end_document_called @parser.finish assert @parser.document.end_document_called end def test_start_element @parser.<<(<<-eoxml)

      eoxml assert_equal [["p", [["id", "asdfasdf"]]]], @parser.document.start_elements @parser.<<(<<-eoxml) Paragraph 1

      eoxml assert_equal [' This is a comment '], @parser.document.comments @parser.finish end def test_start_element_with_namespaces @parser.<<(<<-eoxml)

      eoxml assert_equal [["p", [["xmlns:foo", "http://foo.example.com/"]]]], @parser.document.start_elements @parser.<<(<<-eoxml) Paragraph 1

      eoxml assert_equal [' This is a comment '], @parser.document.comments @parser.finish end def test_start_element_ns @parser.<<(<<-eoxml) eoxml assert_equal 1, @parser.document.start_elements_namespace.length el = @parser.document.start_elements_namespace.first assert_equal 'stream', el.first assert_equal 2, el[1].length assert_equal [['version', '1.0'], ['size', 'large']], el[1].map { |x| [x.localname, x.value] } assert_equal 'stream', el[2] assert_equal 'http://etherx.jabber.org/streams', el[3] @parser.finish end def test_end_element_ns @parser.<<(<<-eoxml) eoxml assert_equal [['stream', 'stream', 'http://etherx.jabber.org/streams']], @parser.document.end_elements_namespace @parser.finish end def test_chevron_partial_xml @parser.<<(<<-eoxml)

      eoxml @parser.<<(<<-eoxml) Paragraph 1

      eoxml assert_equal [' This is a comment '], @parser.document.comments @parser.finish end def test_chevron @parser.<<(<<-eoxml)

      Paragraph 1

      eoxml @parser.finish assert_equal [' This is a comment '], @parser.document.comments end def test_default_options assert_equal 0, @parser.options end def test_recover @parser.options |= XML::ParseOptions::RECOVER @parser.<<(<<-eoxml)

      Foo Bar

      eoxml @parser.finish assert(@parser.document.errors.size >= 1) assert_equal [["p", []], ["bar", []]], @parser.document.start_elements assert_equal "FooBar", @parser.document.data.map { |x| x.gsub(/\s/, '') }.join end def test_broken_encoding skip("ultra hard to fix for pure Java version") if Nokogiri.jruby? @parser.options |= XML::ParseOptions::RECOVER # This is ISO_8859-1: @parser.<< "Gau\337" @parser.finish assert(@parser.document.errors.size >= 1) assert_equal "Gau\337", @parser.document.data.join assert_equal [["r"]], @parser.document.end_elements end end end end end nokogiri-1.6.1/test/xml/sax/test_parser_context.rb0000644000175000017500000000500012261213762021711 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module XML module SAX class TestParserContext < Nokogiri::SAX::TestCase def setup @xml = ' world ' end class Counter < Nokogiri::XML::SAX::Document attr_accessor :context, :lines, :columns def initialize @context = nil @lines = [] @columns = [] end def start_element name, attrs = [] @lines << [name, context.line] @columns << [name, context.column] end end def test_line_numbers sax_handler = Counter.new parser = Nokogiri::XML::SAX::Parser.new(sax_handler) parser.parse(@xml) do |ctx| sax_handler.context = ctx end assert_equal [["hello", 1], ["inter", 4], ["net", 5]], sax_handler.lines end def test_column_numbers sax_handler = Counter.new parser = Nokogiri::XML::SAX::Parser.new(sax_handler) parser.parse(@xml) do |ctx| sax_handler.context = ctx end assert_equal [["hello", 7], ["inter", 7], ["net", 9]], sax_handler.columns end def test_replace_entities pc = ParserContext.new StringIO.new(''), 'UTF-8' pc.replace_entities = false assert_equal false, pc.replace_entities pc.replace_entities = true assert_equal true, pc.replace_entities end def test_from_io ctx = ParserContext.new StringIO.new('fo'), 'UTF-8' assert ctx end def test_from_string assert ParserContext.new 'blah blah' end def test_parse_with ctx = ParserContext.new 'blah' assert_raises ArgumentError do ctx.parse_with nil end end def test_parse_with_sax_parser xml = "" ctx = ParserContext.new xml parser = Parser.new Doc.new assert_nil ctx.parse_with parser end def test_from_file ctx = ParserContext.file XML_FILE parser = Parser.new Doc.new assert_nil ctx.parse_with parser end def test_parse_with_returns_nil xml = "" ctx = ParserContext.new xml parser = Parser.new Doc.new assert_nil ctx.parse_with(parser) end end end end end nokogiri-1.6.1/test/xml/sax/test_parser.rb0000644000175000017500000002615312261213762020161 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module XML module SAX class TestParser < Nokogiri::SAX::TestCase def setup super @parser = XML::SAX::Parser.new(Doc.new) end def test_parser_context_yielded_io doc = Doc.new parser = XML::SAX::Parser.new doc xml = "" block_called = false parser.parse(StringIO.new(xml)) { |ctx| block_called = true ctx.replace_entities = true } assert block_called assert_equal [['foo', [['a', '&b']]]], doc.start_elements end def test_parser_context_yielded_in_memory doc = Doc.new parser = XML::SAX::Parser.new doc xml = "" block_called = false parser.parse(xml) { |ctx| block_called = true ctx.replace_entities = true } assert block_called assert_equal [['foo', [['a', '&b']]]], doc.start_elements end def test_xml_decl { '' => nil, '' => ['1.0'], '' => ['1.0', 'UTF-8'], '' => ['1.0', 'yes'], '' => ['1.0', 'no'], }.each do |decl,value| parser = XML::SAX::Parser.new(Doc.new) xml = "#{decl}\n" parser.parse xml assert parser.document.start_document_called, xml assert_equal value, parser.document.xmldecls, xml end end def test_parse_empty assert_raises RuntimeError do @parser.parse('') end end def test_namespace_declaration_order_is_saved @parser.parse <<-eoxml eoxml assert_equal 2, @parser.document.start_elements_namespace.length el = @parser.document.start_elements_namespace.first namespaces = el.last assert_equal ['foo', 'http://foo.example.com/'], namespaces.first assert_equal [nil, 'http://example.com/'], namespaces.last end def test_bad_document_calls_error_handler @parser.parse('') assert @parser.document.errors assert @parser.document.errors.length > 0 end def test_namespace_are_super_fun_to_parse @parser.parse <<-eoxml hello world eoxml assert @parser.document.start_elements_namespace.length > 0 el = @parser.document.start_elements_namespace[1] assert_equal 'a', el.first assert_equal 1, el[1].length attribute = el[1].first assert_equal 'bar', attribute.localname assert_equal 'foo', attribute.prefix assert_equal 'hello', attribute.value assert_equal 'http://foo.example.com/', attribute.uri end def test_sax_v1_namespace_attribute_declarations @parser.parse <<-eoxml hello world eoxml assert @parser.document.start_elements.length > 0 elm = @parser.document.start_elements.first assert_equal 'root', elm.first assert elm[1].include?(['xmlns:foo', 'http://foo.example.com/']) assert elm[1].include?(['xmlns', 'http://example.com/']) end def test_sax_v1_namespace_nodes @parser.parse <<-eoxml hello world eoxml assert_equal 5, @parser.document.start_elements.length assert @parser.document.start_elements.map { |se| se.first }.include?('foo:bar') assert @parser.document.end_elements.map { |se| se.first }.include?('foo:bar') end def test_start_is_called_without_namespace @parser.parse(<<-eoxml) eoxml assert_equal ['root', 'foo:f', 'bar'], @parser.document.start_elements.map { |x| x.first } end def test_parser_sets_encoding parser = XML::SAX::Parser.new(Doc.new, 'UTF-8') assert_equal 'UTF-8', parser.encoding end def test_errors_set_after_parsing_bad_dom doc = Nokogiri::XML('') assert doc.errors @parser.parse('') assert @parser.document.errors assert @parser.document.errors.length > 0 if RUBY_VERSION =~ /^1\.9/ doc.errors.each do |error| assert_equal 'UTF-8', error.message.encoding.name end end # when using JRuby Nokogiri, more errors will be generated as the DOM # parser continue to parse an ill formed document, while the sax parser # will stop at the first error unless Nokogiri.jruby? assert_equal doc.errors.length, @parser.document.errors.length end end def test_parse_with_memory_argument @parser.parse(File.read(XML_FILE)) assert(@parser.document.cdata_blocks.length > 0) end def test_parse_with_io_argument File.open(XML_FILE, 'rb') { |f| @parser.parse(f) } assert(@parser.document.cdata_blocks.length > 0) end def test_parse_io call_parse_io_with_encoding 'UTF-8' end # issue #828 def test_parse_io_lower_case_encoding call_parse_io_with_encoding 'utf-8' end def call_parse_io_with_encoding encoding File.open(XML_FILE, 'rb') { |f| @parser.parse_io(f, encoding) } assert(@parser.document.cdata_blocks.length > 0) if RUBY_VERSION =~ /^1\.9/ called = false @parser.document.start_elements.flatten.each do |thing| assert_equal 'UTF-8', thing.encoding.name called = true end assert called called = false @parser.document.end_elements.flatten.each do |thing| assert_equal 'UTF-8', thing.encoding.name called = true end assert called called = false @parser.document.data.each do |thing| assert_equal 'UTF-8', thing.encoding.name called = true end assert called called = false @parser.document.comments.flatten.each do |thing| assert_equal 'UTF-8', thing.encoding.name called = true end assert called called = false @parser.document.cdata_blocks.flatten.each do |thing| assert_equal 'UTF-8', thing.encoding.name called = true end assert called end end def test_parse_file @parser.parse_file(XML_FILE) assert_raises(ArgumentError) { @parser.parse_file(nil) } assert_raises(Errno::ENOENT) { @parser.parse_file('') } assert_raises(Errno::EISDIR) { @parser.parse_file(File.expand_path(File.dirname(__FILE__))) } end def test_render_parse_nil_param assert_raises(ArgumentError) { @parser.parse_memory(nil) } end def test_bad_encoding_args assert_raises(ArgumentError) { XML::SAX::Parser.new(Doc.new, 'not an encoding') } assert_raises(ArgumentError) { @parser.parse_io(StringIO.new(''), 'not an encoding')} end def test_ctag @parser.parse_memory(<<-eoxml)

      Paragraph 1

      eoxml assert_equal [' This is a comment '], @parser.document.cdata_blocks end def test_comment @parser.parse_memory(<<-eoxml)

      Paragraph 1

      eoxml assert_equal [' This is a comment '], @parser.document.comments end def test_characters @parser.parse_memory(<<-eoxml)

      Paragraph 1

      eoxml assert_equal ['Paragraph 1'], @parser.document.data end def test_end_document @parser.parse_memory(<<-eoxml)

      Paragraph 1

      eoxml assert @parser.document.end_document_called end def test_end_element @parser.parse_memory(<<-eoxml)

      Paragraph 1

      eoxml assert_equal [["p"]], @parser.document.end_elements end def test_start_element_attrs @parser.parse_memory(<<-eoxml)

      Paragraph 1

      eoxml assert_equal [["p", [["id", "asdfasdf"]]]], @parser.document.start_elements end def test_start_element_attrs_include_namespaces @parser.parse_memory(<<-eoxml)

      Paragraph 1

      eoxml assert_equal [["p", [['xmlns:foo', 'http://foo.example.com/']]]], @parser.document.start_elements end def test_processing_instruction @parser.parse_memory(<<-eoxml) eoxml assert_equal [['xml-stylesheet', 'href="a.xsl" type="text/xsl"']], @parser.document.processing_instructions end if Nokogiri.uses_libxml? # JRuby SAXParser only parses well-formed XML documents def test_parse_document @parser.parse_memory(<<-eoxml)

      Paragraph 1

      Paragraph 2

      eoxml end end def test_parser_attributes xml = <<-eoxml eoxml block_called = false @parser.parse(xml) { |ctx| block_called = true ctx.replace_entities = true } assert block_called assert_equal [['root', []], ['foo', [['a', '&b'], ['c', '>d']]]], @parser.document.start_elements end end end end end nokogiri-1.6.1/test/xml/node/0000755000175000017500000000000012261213762015424 5ustar boutilboutilnokogiri-1.6.1/test/xml/node/test_subclass.rb0000644000175000017500000000303112261213762020624 0ustar boutilboutilrequire "helper" module Nokogiri module XML class Node class TestSubclass < Nokogiri::TestCase { Nokogiri::XML::CDATA => 'doc, "foo"', Nokogiri::XML::Attr => 'doc, "foo"', Nokogiri::XML::Comment => 'doc, "foo"', Nokogiri::XML::EntityReference => 'doc, "foo"', Nokogiri::XML::ProcessingInstruction => 'doc, "foo", "bar"', Nokogiri::XML::DocumentFragment => 'doc', Nokogiri::XML::Node => '"foo", doc', Nokogiri::XML::Text => '"foo", doc', }.each do |klass, constructor| class_eval %{ def test_subclass_#{klass.name.gsub('::', '_')} doc = Nokogiri::XML::Document.new klass = Class.new(#{klass.name}) node = klass.new(#{constructor}) assert_instance_of klass, node end } class_eval <<-eocode, __FILE__, __LINE__ + 1 def test_subclass_initialize_#{klass.name.gsub('::', '_')} doc = Nokogiri::XML::Document.new klass = Class.new(#{klass.name}) do attr_accessor :initialized_with def initialize *args @initialized_with = args end end node = klass.new(#{constructor}, 1) assert_equal [#{constructor}, 1], node.initialized_with end eocode end end end end end nokogiri-1.6.1/test/xml/node/test_save_options.rb0000644000175000017500000000143712261213762021526 0ustar boutilboutilrequire "helper" module Nokogiri module XML class Node class TestSaveOptions < Nokogiri::TestCase SaveOptions.constants.each do |constant| class_eval <<-EOEVAL def test_predicate_#{constant.downcase} options = SaveOptions.new(SaveOptions::#{constant}) assert options.#{constant.downcase}? assert SaveOptions.new.#{constant.downcase}.#{constant.downcase}? end EOEVAL end def test_default_xml_save_options if Nokogiri.jruby? assert_equal 0, (SaveOptions::DEFAULT_XML & SaveOptions::FORMAT) else assert_equal SaveOptions::FORMAT, (SaveOptions::DEFAULT_XML & SaveOptions::FORMAT) end end end end end end nokogiri-1.6.1/test/xml/test_document.rb0000644000175000017500000006212712261213762017711 0ustar boutilboutilrequire "helper" require 'uri' module Nokogiri module XML class TestDocument < Nokogiri::TestCase URI = if URI.const_defined?(:DEFAULT_PARSER) ::URI::DEFAULT_PARSER else ::URI end def setup super @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) end def test_dtd_with_empty_internal_subset doc = Nokogiri::XML <<-eoxml eoxml assert doc.root end # issue #838 def test_document_with_invalid_prolog doc = Nokogiri::XML '' assert_empty doc.content end # issue #837 def test_document_with_refentity doc = Nokogiri::XML '&' assert_equal '', doc.content end # issue #835 def test_manually_adding_reference_entities d = Nokogiri::XML::Document.new root = Nokogiri::XML::Element.new('bar', d) txt = Nokogiri::XML::Text.new('foo', d) ent = Nokogiri::XML::EntityReference.new(d, '#8217') root << txt root << ent d << root assert_match /’/, d.to_html end def test_document_with_initial_space doc = Nokogiri::XML(" ") assert_equal 2, doc.children.size end def test_root_set_to_nil @xml.root = nil assert_equal nil, @xml.root end def test_million_laugh_attach doc = Nokogiri::XML ' ]> &lol9;' assert_not_nil doc end def test_million_laugh_attach_2 doc = Nokogiri::XML ' ]> &a; ' assert_not_nil doc end def test_ignore_unknown_namespace doc = Nokogiri::XML(<<-eoxml) eoxml if Nokogiri.jruby? refute doc.xpath('//foo').first.namespace # assert that the namespace is nil end refute_empty doc.xpath('//bar'), "bar wasn't found in the document" # bar should be part of the doc end def test_collect_namespaces doc = Nokogiri::XML(<<-eoxml) eoxml assert_equal({"xmlns"=>"hello", "xmlns:foo"=>"world"}, doc.collect_namespaces) end def test_subclass_initialize_modify # testing a segv Class.new(Nokogiri::XML::Document) { def initialize super body_node = Nokogiri::XML::Node.new "body", self body_node.content = "stuff" self.root = body_node end }.new end def test_create_text_node txt = @xml.create_text_node 'foo' assert_instance_of Nokogiri::XML::Text, txt assert_equal 'foo', txt.text assert_equal @xml, txt.document end def test_create_text_node_with_block @xml.create_text_node 'foo' do |txt| assert_instance_of Nokogiri::XML::Text, txt assert_equal 'foo', txt.text assert_equal @xml, txt.document end end def test_create_element elm = @xml.create_element('foo') assert_instance_of Nokogiri::XML::Element, elm assert_equal 'foo', elm.name assert_equal @xml, elm.document end def test_create_element_with_block @xml.create_element('foo') do |elm| assert_instance_of Nokogiri::XML::Element, elm assert_equal 'foo', elm.name assert_equal @xml, elm.document end end def test_create_element_with_attributes elm = @xml.create_element('foo',:a => "1") assert_instance_of Nokogiri::XML::Element, elm assert_instance_of Nokogiri::XML::Attr, elm.attributes["a"] assert_equal "1", elm["a"] end def test_create_element_with_namespace elm = @xml.create_element('foo',:'xmlns:foo' => 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', elm.namespaces['xmlns:foo'] end def test_create_element_with_hyphenated_namespace elm = @xml.create_element('foo',:'xmlns:SOAP-ENC' => 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', elm.namespaces['xmlns:SOAP-ENC'] end def test_create_element_with_content elm = @xml.create_element('foo',"needs more xml/violence") assert_equal "needs more xml/violence", elm.content end def test_create_cdata cdata = @xml.create_cdata("abc") assert_instance_of Nokogiri::XML::CDATA, cdata assert_equal "abc", cdata.content end def test_create_cdata_with_block @xml.create_cdata("abc") do |cdata| assert_instance_of Nokogiri::XML::CDATA, cdata assert_equal "abc", cdata.content end end def test_create_comment comment = @xml.create_comment("abc") assert_instance_of Nokogiri::XML::Comment, comment assert_equal "abc", comment.content end def test_create_comment_with_block @xml.create_comment("abc") do |comment| assert_instance_of Nokogiri::XML::Comment, comment assert_equal "abc", comment.content end end def test_pp out = StringIO.new('') ::PP.pp @xml, out assert_operator out.string.length, :>, 0 end def test_create_internal_subset_on_existing_subset assert_not_nil @xml.internal_subset assert_raises(RuntimeError) do @xml.create_internal_subset('staff', nil, 'staff.dtd') end end def test_create_internal_subset xml = Nokogiri::XML('') assert_nil xml.internal_subset xml.create_internal_subset('name', nil, 'staff.dtd') ss = xml.internal_subset assert_equal 'name', ss.name assert_nil ss.external_id assert_equal 'staff.dtd', ss.system_id end def test_external_subset assert_nil @xml.external_subset Dir.chdir(ASSETS_DIR) do @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) { |cfg| cfg.dtdload } end assert @xml.external_subset end def test_create_external_subset_fails_with_existing_subset assert_nil @xml.external_subset Dir.chdir(ASSETS_DIR) do @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) { |cfg| cfg.dtdload } end assert @xml.external_subset assert_raises(RuntimeError) do @xml.create_external_subset('staff', nil, 'staff.dtd') end end def test_create_external_subset dtd = @xml.create_external_subset('staff', nil, 'staff.dtd') assert_nil dtd.external_id assert_equal 'staff.dtd', dtd.system_id assert_equal 'staff', dtd.name assert_equal dtd, @xml.external_subset end def test_version assert_equal '1.0', @xml.version end def test_add_namespace assert_raise NoMethodError do @xml.add_namespace('foo', 'bar') end end def test_attributes assert_raise NoMethodError do @xml.attributes end end def test_namespace assert_raise NoMethodError do @xml.namespace end end def test_namespace_definitions assert_raise NoMethodError do @xml.namespace_definitions end end def test_line assert_raise NoMethodError do @xml.line end end def test_empty_node_converted_to_html_is_not_self_closing doc = Nokogiri::XML('
      ') assert_equal "", doc.inner_html end def test_fragment fragment = @xml.fragment assert_equal 0, fragment.children.length end def test_add_child_fragment_with_single_node doc = Nokogiri::XML::Document.new fragment = doc.fragment('') doc.add_child fragment assert_equal '/hello', doc.at('//hello').path assert_equal 'hello', doc.root.name end def test_add_child_fragment_with_multiple_nodes doc = Nokogiri::XML::Document.new fragment = doc.fragment('') assert_raises(RuntimeError) do doc.add_child fragment end end def test_add_child_with_multiple_roots assert_raises(RuntimeError) do @xml << Node.new('foo', @xml) end end def test_add_child_with_string doc = Nokogiri::XML::Document.new doc.add_child "
      quack!
      " assert_equal 1, doc.root.children.length assert_equal "quack!", doc.root.children.first.content end def test_move_root_to_document_with_no_root sender = Nokogiri::XML('foo') newdoc = Nokogiri::XML::Document.new newdoc.root = sender.root end def test_move_root_with_existing_root_gets_gcd doc = Nokogiri::XML('test') doc2 = Nokogiri::XML("#{'x' * 5000000}") doc2.root = doc.root end def test_validate if Nokogiri.uses_libxml? assert_equal 44, @xml.validate.length else xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) {|cfg| cfg.dtdvalid} assert_equal 40, xml.validate.length end end def test_validate_no_internal_subset doc = Nokogiri::XML('') assert_nil doc.validate end def test_clone assert @xml.clone end def test_document_should_not_have_default_ns doc = Nokogiri::XML::Document.new assert_raises NoMethodError do doc.default_namespace = 'http://innernet.com/' end assert_raises NoMethodError do doc.add_namespace_definition('foo', 'bar') end end def test_parse_handles_nil_gracefully @doc = Nokogiri::XML::Document.parse(nil) assert_instance_of Nokogiri::XML::Document, @doc end def test_parse_takes_block options = nil Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) do |cfg| options = cfg end assert options end def test_parse_yields_parse_options options = nil Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) do |cfg| options = cfg options.nonet.nowarning.dtdattr end assert options.nonet? assert options.nowarning? assert options.dtdattr? end def test_XML_takes_block options = nil Nokogiri::XML(File.read(XML_FILE), XML_FILE) do |cfg| options = cfg options.nonet.nowarning.dtdattr end assert options.nonet? assert options.nowarning? assert options.dtdattr? end def test_subclass klass = Class.new(Nokogiri::XML::Document) doc = klass.new assert_instance_of klass, doc end def test_subclass_initialize klass = Class.new(Nokogiri::XML::Document) do attr_accessor :initialized_with def initialize(*args) @initialized_with = args end end doc = klass.new("1.0", 1) assert_equal ["1.0", 1], doc.initialized_with end def test_subclass_dup klass = Class.new(Nokogiri::XML::Document) doc = klass.new.dup assert_instance_of klass, doc end def test_subclass_parse klass = Class.new(Nokogiri::XML::Document) doc = klass.parse(File.read(XML_FILE)) # lame hack uses root to avoid comparing DOCTYPE tags which can appear out of order. # I should really finish lorax and use that here. assert_equal @xml.root.to_s, doc.root.to_s assert_instance_of klass, doc end def test_document_parse_method xml = Nokogiri::XML::Document.parse(File.read(XML_FILE)) # lame hack uses root to avoid comparing DOCTYPE tags which can appear out of order. # I should really finish lorax and use that here. assert_equal @xml.root.to_s, xml.root.to_s end def test_encoding= @xml.encoding = 'UTF-8' assert_match 'UTF-8', @xml.to_xml @xml.encoding = 'EUC-JP' assert_match 'EUC-JP', @xml.to_xml end def test_namespace_should_not_exist assert_raises(NoMethodError) { @xml.namespace } end def test_non_existant_function # WTF. I don't know why this is different between MRI and Jruby # They should be the same... Either way, raising an exception # is the correct thing to do. exception = RuntimeError if !Nokogiri.uses_libxml? || (Nokogiri.uses_libxml? && Nokogiri::VERSION_INFO['libxml']['platform'] == 'jruby') exception = Nokogiri::XML::XPath::SyntaxError end assert_raises(exception) { @xml.xpath('//name[foo()]') } end def test_ancestors assert_equal 0, @xml.ancestors.length end def test_root_node_parent_is_document parent = @xml.root.parent assert_equal @xml, parent assert_instance_of Nokogiri::XML::Document, parent end def test_xmlns_is_automatically_registered doc = Nokogiri::XML(<<-eoxml) bar eoxml assert_equal 1, doc.css('xmlns|foo').length assert_equal 1, doc.css('foo').length assert_equal 0, doc.css('|foo').length assert_equal 1, doc.xpath('//xmlns:foo').length assert_equal 1, doc.search('xmlns|foo').length assert_equal 1, doc.search('//xmlns:foo').length assert doc.at('xmlns|foo') assert doc.at('//xmlns:foo') assert doc.at('foo') end def test_xmlns_is_registered_for_nodesets doc = Nokogiri::XML(<<-eoxml) baz eoxml assert_equal 1, doc.css('xmlns|foo').css('xmlns|bar').length assert_equal 1, doc.css('foo').css('bar').length assert_equal 1, doc.xpath('//xmlns:foo').xpath('./xmlns:bar').length assert_equal 1, doc.search('xmlns|foo').search('xmlns|bar').length assert_equal 1, doc.search('//xmlns:foo').search('./xmlns:bar').length end def test_to_xml_with_indent doc = Nokogiri::XML('') doc = Nokogiri::XML(doc.to_xml(:indent => 5)) assert_indent 5, doc end def test_write_xml_to_with_indent io = StringIO.new doc = Nokogiri::XML('') doc.write_xml_to io, :indent => 5 io.rewind doc = Nokogiri::XML(io.read) assert_indent 5, doc end # wtf... osx's libxml sucks. unless !Nokogiri.uses_libxml? || Nokogiri::LIBXML_VERSION =~ /^2\.6\./ def test_encoding xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE, 'UTF-8') assert_equal 'UTF-8', xml.encoding end end def test_memory_explosion_on_invalid_xml doc = Nokogiri::XML("<<<") refute_nil doc refute_empty doc.errors end def test_document_has_errors doc = Nokogiri::XML(<<-eoxml) eoxml assert doc.errors.length > 0 doc.errors.each do |error| assert_match error.message, error.inspect assert_match error.message, error.to_s end end def test_strict_document_throws_syntax_error assert_raises(Nokogiri::XML::SyntaxError) { Nokogiri::XML('', nil, nil, 0) } assert_raises(Nokogiri::XML::SyntaxError) { Nokogiri::XML('') { |cfg| cfg.strict } } assert_raises(Nokogiri::XML::SyntaxError) { Nokogiri::XML(StringIO.new('')) { |cfg| cfg.strict } } end def test_XML_function xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE) assert xml.xml? end def test_url assert @xml.url assert_equal XML_FILE, URI.unescape(@xml.url).sub('file:///', '') end def test_document_parent xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE) assert_raises(NoMethodError) { xml.parent } end def test_document_name xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE) assert_equal 'document', xml.name end def test_parse_can_take_io xml = nil File.open(XML_FILE, 'rb') { |f| xml = Nokogiri::XML(f) } assert xml.xml? set = xml.search('//employee') assert set.length > 0 end def test_parsing_empty_io doc = Nokogiri::XML.parse(StringIO.new('')) refute_nil doc end def test_search_on_empty_documents doc = Nokogiri::XML::Document.new ns = doc.search('//foo') assert_equal 0, ns.length ns = doc.css('foo') assert_equal 0, ns.length ns = doc.xpath('//foo') assert_equal 0, ns.length end def test_bad_xpath_raises_syntax_error assert_raises(XML::XPath::SyntaxError) { @xml.xpath('\\') } end def test_find_with_namespace doc = Nokogiri::XML.parse(<<-eoxml) snuggles! eoxml ctx = Nokogiri::XML::XPathContext.new(doc) ctx.register_ns 'tenderlove', 'http://tenderlovemaking.com/' set = ctx.evaluate('//tenderlove:foo') assert_equal 1, set.length assert_equal 'foo', set.first.name # It looks like only the URI is important: ctx = Nokogiri::XML::XPathContext.new(doc) ctx.register_ns 'america', 'http://tenderlovemaking.com/' set = ctx.evaluate('//america:foo') assert_equal 1, set.length assert_equal 'foo', set.first.name # Its so important that a missing slash will cause it to return nothing ctx = Nokogiri::XML::XPathContext.new(doc) ctx.register_ns 'america', 'http://tenderlovemaking.com' set = ctx.evaluate('//america:foo') assert_equal 0, set.length end def test_xml? assert @xml.xml? end def test_document assert @xml.document end def test_singleton_methods assert node_set = @xml.search('//name') assert node_set.length > 0 node = node_set.first def node.test 'test' end assert node_set = @xml.search('//name') assert_equal 'test', node_set.first.test end def test_multiple_search assert node_set = @xml.search('//employee', '//name') employees = @xml.search('//employee') names = @xml.search('//name') assert_equal(employees.length + names.length, node_set.length) end def test_node_set_index assert node_set = @xml.search('//employee') assert_equal(5, node_set.length) assert node_set[4] assert_nil node_set[5] end def test_search assert node_set = @xml.search('//employee') assert_equal(5, node_set.length) node_set.each do |node| assert_equal('employee', node.name) end end def test_dump assert @xml.serialize assert @xml.to_xml end def test_dup dup = @xml.dup assert_instance_of Nokogiri::XML::Document, dup assert dup.xml?, 'duplicate should be xml' end def test_new doc = nil doc = Nokogiri::XML::Document.new assert doc assert doc.xml? assert_nil doc.root end def test_set_root doc = nil doc = Nokogiri::XML::Document.new assert doc assert doc.xml? assert_nil doc.root node = Nokogiri::XML::Node.new("b", doc) { |n| n.content = 'hello world' } assert_equal('hello world', node.content) doc.root = node assert_equal(node, doc.root) end def test_remove_namespaces doc = Nokogiri::XML <<-EOX hello from a hello from b hello from c EOX namespaces = doc.root.namespaces # assert on setup assert_equal 2, doc.root.namespaces.length assert_equal 3, doc.at_xpath("//container").namespaces.length assert_equal 0, doc.xpath("//foo").length assert_equal 1, doc.xpath("//a:foo").length assert_equal 1, doc.xpath("//a:foo").length assert_equal 1, doc.xpath("//x:foo", "x" => "http://c.flavorjon.es/").length assert_match %r{foo c:attr}, doc.to_xml doc.remove_namespaces! assert_equal 0, doc.root.namespaces.length assert_equal 0, doc.at_xpath("//container").namespaces.length assert_equal 3, doc.xpath("//foo").length assert_equal 0, doc.xpath("//a:foo", namespaces).length assert_equal 0, doc.xpath("//a:foo", namespaces).length assert_equal 0, doc.xpath("//x:foo", "x" => "http://c.flavorjon.es/").length assert_match %r{foo attr}, doc.to_xml end # issue #785 def test_attribute_decoration decorator = Module.new do def test_method end end util_decorate(@xml, decorator) assert @xml.search('//@street').first.respond_to?(:test_method) end def test_subset_is_decorated x = Module.new do def awesome! end end util_decorate(@xml, x) assert @xml.respond_to?(:awesome!) assert node_set = @xml.search('//staff') assert node_set.respond_to?(:awesome!) assert subset = node_set.search('.//employee') assert subset.respond_to?(:awesome!) assert sub_subset = node_set.search('.//name') assert sub_subset.respond_to?(:awesome!) end def test_decorator_is_applied x = Module.new do def awesome! end end util_decorate(@xml, x) assert @xml.respond_to?(:awesome!) assert node_set = @xml.search('//employee') assert node_set.respond_to?(:awesome!) node_set.each do |node| assert node.respond_to?(:awesome!), node.class end assert @xml.root.respond_to?(:awesome!) assert @xml.children.respond_to?(:awesome!) end if Nokogiri.jruby? def wrap_java_document require 'java' factory = javax.xml.parsers.DocumentBuilderFactory.newInstance builder = factory.newDocumentBuilder document = builder.newDocument root = document.createElement("foo") document.appendChild(root) Nokogiri::XML::Document.wrap(document) end end def test_java_integration skip("Ruby doesn't have the wrap method") unless Nokogiri.jruby? noko_doc = wrap_java_document assert_equal 'foo', noko_doc.root.name noko_doc = Nokogiri::XML(<
      eoxml dom = noko_doc.to_java assert dom.kind_of? org.w3c.dom.Document assert_equal 'foo', dom.getDocumentElement().getTagName() end def test_add_child skip("Ruby doesn't have the wrap method") unless Nokogiri.jruby? doc = wrap_java_document doc.root.add_child "" end def test_can_be_closed f = File.open XML_FILE Nokogiri::XML f f.close end end end end nokogiri-1.6.1/test/xml/test_processing_instruction.rb0000644000175000017500000000133412261213762022701 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestProcessingInstruction < Nokogiri::TestCase def setup super @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) end def test_type assert_equal(Node::PI_NODE, @xml.children[0].type) end def test_name assert_equal 'TEST-STYLE', @xml.children[0].name end def test_new assert ref = ProcessingInstruction.new(@xml, 'name', 'content') assert_instance_of ProcessingInstruction, ref end def test_many_new 100.times { ProcessingInstruction.new(@xml, 'foo', 'bar') } @xml.root << ProcessingInstruction.new(@xml, 'foo', 'bar') end end end end nokogiri-1.6.1/test/xml/test_node_attributes.rb0000644000175000017500000000530612261213762021262 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestNodeAttributes < Nokogiri::TestCase def test_attribute_with_ns doc = Nokogiri::XML <<-eoxml eoxml node = doc.at('node') assert_equal 'bar', node.attribute_with_ns('foo', 'http://tenderlovemaking.com/').value end def test_prefixed_attributes doc = Nokogiri::XML "" node = doc.root assert_equal 'en-GB', node['xml:lang'] assert_equal 'en-GB', node.attributes['lang'].value assert_equal nil, node['lang'] end def test_set_prefixed_attributes doc = Nokogiri::XML %Q{} node = doc.root node['xml:lang'] = 'en-GB' node['foo:bar'] = 'bazz' assert_equal 'en-GB', node['xml:lang'] assert_equal 'en-GB', node.attributes['lang'].value assert_equal nil, node['lang'] assert_equal 'http://www.w3.org/XML/1998/namespace', node.attributes['lang'].namespace.href assert_equal 'bazz', node['foo:bar'] assert_equal 'bazz', node.attributes['bar'].value assert_equal nil, node['bar'] assert_equal 'x', node.attributes['bar'].namespace.href end def test_append_child_namespace_definitions_prefixed_attributes doc = Nokogiri::XML "" node = doc.root node['xml:lang'] = 'en-GB' assert_equal [], node.namespace_definitions.map(&:prefix) child_node = Nokogiri::XML::Node.new 'foo', doc node << child_node assert_equal [], node.namespace_definitions.map(&:prefix) end def test_namespace_key? doc = Nokogiri::XML <<-eoxml eoxml node = doc.at('node') assert node.namespaced_key?('foo', 'http://tenderlovemaking.com/') assert node.namespaced_key?('foo', nil) assert !node.namespaced_key?('foo', 'foo') end def test_set_attribute_frees_nodes # testing a segv skip("JRuby doesn't do GC.") if Nokogiri.jruby? document = Nokogiri::XML.parse("") node = document.root node['visible'] = 'foo' attribute = node.attribute('visible') text = Nokogiri::XML::Text.new 'bar', document attribute.add_child(text) begin gc_previous = GC.stress GC.stress = true node['visible'] = 'attr' ensure GC.stress = gc_previous end end end end end nokogiri-1.6.1/test/xml/test_parse_options.rb0000644000175000017500000000332212261213762020750 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestParseOptions < Nokogiri::TestCase def test_new options = Nokogiri::XML::ParseOptions.new assert_equal 0, options.options end def test_to_i options = Nokogiri::XML::ParseOptions.new assert_equal 0, options.to_i end ParseOptions.constants.each do |constant| next if constant == 'STRICT' class_eval %{ def test_predicate_#{constant.downcase} options = ParseOptions.new(ParseOptions::#{constant}) assert options.#{constant.downcase}? assert ParseOptions.new.#{constant.downcase}.#{constant.downcase}? end } end def test_strict_noent options = ParseOptions.new.recover.noent assert !options.strict? end def test_new_with_argument options = Nokogiri::XML::ParseOptions.new 1 << 1 assert_equal 1 << 1, options.options end def test_unsetting options = Nokogiri::XML::ParseOptions.new Nokogiri::XML::ParseOptions::DEFAULT_HTML assert options.nonet? assert options.recover? options.nononet.norecover assert ! options.nonet? assert ! options.recover? options.nonet.recover assert options.nonet? assert options.recover? end def test_chaining options = Nokogiri::XML::ParseOptions.new.recover.noent assert options.recover? assert options.noent? end def test_inspect options = Nokogiri::XML::ParseOptions.new.recover.noent ins = options.inspect assert_match(/recover/, ins) assert_match(/noent/, ins) end end end end nokogiri-1.6.1/test/xml/test_xpath.rb0000644000175000017500000002271312261213762017214 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestXPath < Nokogiri::TestCase # ** WHY ALL THOSE _if Nokogiri.uses_libxml?_ ** # Hi, my dear readers, # # After reading these tests you may be wondering why all those ugly # if Nokogiri.uses_libxml? sparsed over the whole document. Well, let # me explain it. While using XPath in Java, you need the extension # functions to be in a namespace. This is not required by XPath, afaik, # but it is an usual convention though. # # Furthermore, CSS does not support extension functions but it does in # Nokogiri. Result: you cannot use them in JRuby impl. At least, until # the CSS to XPath parser is patched, and let me say that there are more # important features to add before that happens. I hope you will forgive # me. # # Yours truly, # # The guy whose headaches belong to Nokogiri JRuby impl. def setup super @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) @ns = @xml.root.namespaces # TODO: Maybe I should move this to the original code. @ns["nokogiri"] = "http://www.nokogiri.org/default_ns/ruby/extensions_functions" @handler = Class.new { attr_reader :things def initialize @things = [] end def thing thing @things << thing thing end def returns_array node_set @things << node_set.to_a node_set.to_a end def my_filter set, attribute, value set.find_all { |x| x[attribute] == value } end def saves_node_set node_set @things = node_set end def value 123.456 end }.new end def test_variable_binding assert_equal 4, @xml.xpath('//address[@domestic=$value]', nil, :value => 'Yes').length end def test_unknown_attribute assert_equal 0, @xml.xpath('//employee[@id="asdfasdf"]/@fooo').length assert_nil @xml.xpath('//employee[@id="asdfasdf"]/@fooo')[0] end def test_boolean assert_equal false, @xml.xpath('1 = 2') end def test_number assert_equal 2, @xml.xpath('1 + 1') end def test_string assert_equal 'foo', @xml.xpath('concat("fo", "o")') end def test_css_search_uses_custom_selectors_with_arguments set = if Nokogiri.uses_libxml? @xml.css('employee > address:my_filter("domestic", "Yes")', @handler) else @xml.xpath("//employee/address[nokogiri:my_filter(., \"domestic\", \"Yes\")]", @ns, @handler) end assert set.length > 0 set.each do |node| assert_equal 'Yes', node['domestic'] end end def test_css_search_uses_custom_selectors set = @xml.xpath('//employee') if Nokogiri.uses_libxml? @xml.css('employee:thing()', @handler) else @xml.xpath("//employee[nokogiri:thing(.)]", @ns, @handler) end assert_equal(set.length, @handler.things.length) assert_equal(set.to_a, @handler.things.flatten) end def test_pass_self_to_function set = if Nokogiri.uses_libxml? @xml.xpath('//employee/address[my_filter(., "domestic", "Yes")]', @handler) else @xml.xpath('//employee/address[nokogiri:my_filter(., "domestic", "Yes")]', @ns, @handler) end assert set.length > 0 set.each do |node| assert_equal 'Yes', node['domestic'] end end def test_custom_xpath_function_gets_strings set = @xml.xpath('//employee') if Nokogiri.uses_libxml? @xml.xpath('//employee[thing("asdf")]', @handler) else @xml.xpath('//employee[nokogiri:thing("asdf")]', @ns, @handler) end assert_equal(set.length, @handler.things.length) assert_equal(['asdf'] * set.length, @handler.things) end def test_custom_xpath_function_returns_string if Nokogiri.uses_libxml? result = @xml.xpath('thing("asdf")', @handler) else result = @xml.xpath('nokogiri:thing("asdf")', @ns, @handler) end assert_equal 'asdf', result end def test_custom_xpath_gets_true_booleans set = @xml.xpath('//employee') if Nokogiri.uses_libxml? @xml.xpath('//employee[thing(true())]', @handler) else @xml.xpath("//employee[nokogiri:thing(true())]", @ns, @handler) end assert_equal(set.length, @handler.things.length) assert_equal([true] * set.length, @handler.things) end def test_custom_xpath_gets_false_booleans set = @xml.xpath('//employee') if Nokogiri.uses_libxml? @xml.xpath('//employee[thing(false())]', @handler) else @xml.xpath("//employee[nokogiri:thing(false())]", @ns, @handler) end assert_equal(set.length, @handler.things.length) assert_equal([false] * set.length, @handler.things) end def test_custom_xpath_gets_numbers set = @xml.xpath('//employee') if Nokogiri.uses_libxml? @xml.xpath('//employee[thing(10)]', @handler) else @xml.xpath('//employee[nokogiri:thing(10)]', @ns, @handler) end assert_equal(set.length, @handler.things.length) assert_equal([10] * set.length, @handler.things) end def test_custom_xpath_gets_node_sets set = @xml.xpath('//employee/name') if Nokogiri.uses_libxml? @xml.xpath('//employee[thing(name)]', @handler) else @xml.xpath('//employee[nokogiri:thing(name)]', @ns, @handler) end assert_equal(set.length, @handler.things.length) assert_equal(set.to_a, @handler.things.flatten) end def test_custom_xpath_gets_node_sets_and_returns_array set = @xml.xpath('//employee/name') if Nokogiri.uses_libxml? @xml.xpath('//employee[returns_array(name)]', @handler) else @xml.xpath('//employee[nokogiri:returns_array(name)]', @ns, @handler) end assert_equal(set.length, @handler.things.length) assert_equal(set.to_a, @handler.things.flatten) end def test_custom_xpath_handler_is_passed_a_decorated_node_set x = Module.new do def awesome! ; end end util_decorate(@xml, x) assert @xml.xpath('//employee/name') @xml.xpath('//employee[saves_node_set(name)]', @handler) assert_equal @xml, @handler.things.document assert @handler.things.respond_to?(:awesome!) end def test_code_that_invokes_OP_RESET_inside_libxml2 doc = "hi" xpath = 'id("foo")//foo' nokogiri = Nokogiri::HTML.parse(doc) assert nokogiri.xpath(xpath) end def test_custom_xpath_handler_with_args_under_gc_pressure # see http://github.com/sparklemotion/nokogiri/issues/#issue/345 tool_inspector = Class.new do def name_equals(nodeset, name, *args) nodeset.all? do |node| args.each { |thing| thing.inspect } node["name"] == name end end end.new xml = <<-EOXML #{"" * 10} EOXML doc = Nokogiri::XML xml # long list of long arguments, to apply GC pressure during # ruby_funcall argument marshalling xpath = ["//tool[name_equals(.,'hammer'"] 1000.times { xpath << "'unused argument #{'x' * 1000}'" } xpath << "'unused argument')]" xpath = xpath.join(',') assert_equal doc.xpath("//tool[@name='hammer']"), doc.xpath(xpath, tool_inspector) end def test_custom_xpath_without_arguments if Nokogiri.uses_libxml? value = @xml.xpath('value()', @handler) else value = @xml.xpath('nokogiri:value()', @ns, @handler) end assert_equal 123.456, value end def test_custom_xpath_with_bullshit_arguments xml = %q{ } doc = Nokogiri::XML.parse(xml) foo = doc.xpath('//foo[bool_function(bar/baz)]', Class.new { def bool_function(value) true end }.new) assert_equal foo, doc.xpath("//foo") end def test_node_set_should_be_decorated # "called decorate on nill" exception in JRuby issue#514 process_output= < LZ77 END doc = Nokogiri::XML.parse(process_output) node = doc.xpath(%{//track[@type='Video']}) assert_equal "[]", node.xpath("Format").inspect end def test_very_specific_xml_xpath_making_problems_in_jruby # manually merges pull request #681 xml_string = %q{ a } xml_doc = Nokogiri::XML(xml_string) onix = xml_doc.children.first assert_equal 'a', onix.at_xpath('xmlns:Product').at_xpath('xmlns:RecordReference').text end end end end nokogiri-1.6.1/test/xml/test_node_encoding.rb0000644000175000017500000000621312261213762020660 0ustar boutilboutilrequire "helper" module Nokogiri module XML if RUBY_VERSION =~ /^1\.9/ class TestNodeEncoding < Nokogiri::TestCase def setup super @html = Nokogiri::HTML(File.read(HTML_FILE), HTML_FILE) end def test_get_attribute node = @html.css('a').first assert_equal @html.encoding, node['href'].encoding.name end def test_text_encoding_is_utf_8 @html = Nokogiri::HTML(File.open(NICH_FILE)) assert_equal 'UTF-8', @html.text.encoding.name end def test_serialize_encoding_html @html = Nokogiri::HTML(File.open(NICH_FILE)) assert_equal @html.encoding.downcase, @html.serialize.encoding.name.downcase @doc = Nokogiri::HTML(@html.serialize) assert_equal @html.serialize, @doc.serialize end def test_serialize_encoding_xml @xml = Nokogiri::XML(File.open(SHIFT_JIS_XML)) assert_equal @xml.encoding.downcase, @xml.serialize.encoding.name.downcase @doc = Nokogiri::XML(@xml.serialize) assert_equal @xml.serialize, @doc.serialize end def test_encode_special_chars foo = @html.css('a').first.encode_special_chars('foo') assert_equal @html.encoding, foo.encoding.name end def test_content node = @html.css('a').first assert_equal @html.encoding, node.content.encoding.name end def test_name node = @html.css('a').first assert_equal @html.encoding, node.name.encoding.name end def test_path node = @html.css('a').first assert_equal @html.encoding, node.path.encoding.name end def test_namespace xml = <<-eoxml Michelin Model XGV I'm a bicycle tire! eoxml doc = Nokogiri::XML(xml, nil, 'UTF-8') assert_equal 'UTF-8', doc.encoding n = doc.xpath('//part:tire', { 'part' => 'http://schwinn.com/' }).first assert n assert_equal doc.encoding, n.namespace.href.encoding.name assert_equal doc.encoding, n.namespace.prefix.encoding.name end def test_namespace_as_hash xml = <<-eoxml Michelin Model XGV I'm a bicycle tire! eoxml doc = Nokogiri::XML(xml, nil, 'UTF-8') assert_equal 'UTF-8', doc.encoding assert n = doc.xpath('//car').first n.namespace_definitions.each do |nd| assert_equal doc.encoding, nd.href.encoding.name assert_equal doc.encoding, nd.prefix.encoding.name end n.namespaces.each do |k,v| assert_equal doc.encoding, k.encoding.name assert_equal doc.encoding, v.encoding.name end end end end end end nokogiri-1.6.1/test/xml/test_cdata.rb0000644000175000017500000000224412261213762017141 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestCDATA < Nokogiri::TestCase def setup super @xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) end def test_cdata_node name = @xml.xpath('//employee[2]/name').first assert cdata = name.children[1] assert cdata.cdata? assert_equal '#cdata-section', cdata.name end def test_new node = CDATA.new(@xml, "foo") assert_equal "foo", node.content node = CDATA.new(@xml.root, "foo") assert_equal "foo", node.content end def test_new_with_nil node = CDATA.new(@xml, nil) assert_equal nil, node.content end def test_lots_of_new_cdata assert 100.times { CDATA.new(@xml, "asdfasdf") } end def test_content= node = CDATA.new(@xml, 'foo') assert_equal('foo', node.content) node.content = '& &' assert_equal('& &', node.content) assert_equal(' &]]>', node.to_xml) node.content = 'foo ]]> bar' assert_equal('foo ]]> bar', node.content) end end end end nokogiri-1.6.1/test/xml/test_unparented_node.rb0000644000175000017500000002754512261213762021252 0ustar boutilboutilrequire "helper" require 'stringio' module Nokogiri module XML class TestUnparentedNode < Nokogiri::TestCase def setup begin xml = Nokogiri::XML.parse(File.read(XML_FILE), XML_FILE) @node = xml.at('staff') @node.unlink end GC.start # try to GC the document end def test_node_still_has_document assert @node.document end def test_add_namespace node = @node.at('address') node.unlink node.add_namespace('foo', 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', node.namespaces['xmlns:foo'] end def test_write_to io = StringIO.new @node.write_to io io.rewind assert_equal @node.to_xml, io.read end def test_attribute_with_symbol assert_equal 'Yes', @node.css('address').first[:domestic] end def test_write_to_with_block called = false io = StringIO.new conf = nil @node.write_to io do |config| called = true conf = config config.format.as_html.no_empty_tags end io.rewind assert called assert_equal @node.serialize(:save_with => conf.options), io.read end %w{ xml html xhtml }.each do |type| define_method(:"test_write_#{type}_to") do io = StringIO.new assert @node.send(:"write_#{type}_to", io) io.rewind assert_match @node.send(:"to_#{type}"), io.read end end def test_serialize_with_block called = false conf = nil string = @node.serialize do |config| called = true conf = config config.format.as_html.no_empty_tags end assert called assert_equal @node.serialize(nil, conf.options), string end def test_values assert_equal %w{ Yes Yes }, @node.xpath('.//address')[1].values end def test_keys assert_equal %w{ domestic street }, @node.xpath('.//address')[1].keys end def test_each attributes = [] @node.xpath('.//address')[1].each do |key, value| attributes << [key, value] end assert_equal [['domestic', 'Yes'], ['street', 'Yes']], attributes end def test_new assert node = Nokogiri::XML::Node.new('input', @node) assert_equal 1, node.node_type end def test_to_str assert name = @node.xpath('.//name').first assert_match(/Margaret/, '' + name) assert_equal('Margaret Martin', '' + name.children.first) end def test_ancestors assert(address = @node.xpath('.//address').first) assert_equal 2, address.ancestors.length assert_equal ['employee', 'staff'], address.ancestors.map { |x| x ? x.name : x } end def test_read_only? assert entity_decl = @node.internal_subset.children.find { |x| x.type == Node::ENTITY_DECL } assert entity_decl.read_only? end def test_remove_attribute address = @node.xpath('./employee/address').first assert_equal 'Yes', address['domestic'] address.remove_attribute 'domestic' assert_nil address['domestic'] end def test_delete address = @node.xpath('./employee/address').first assert_equal 'Yes', address['domestic'] address.delete 'domestic' assert_nil address['domestic'] end def test_add_child_in_same_document child = @node.css('employee').first assert child.children.last assert new_child = child.children.first last = child.children.last child.add_child(new_child) assert_equal new_child, child.children.last assert_equal last, child.children.last end def test_add_child_from_other_document d1 = Nokogiri::XML("12") d2 = Nokogiri::XML("34") d2.at('root').search('item').each do |i| d1.at('root').add_child i end assert_equal 0, d2.search('item').size assert_equal 4, d1.search('item').size end def test_add_child xml = Nokogiri::XML(<<-eoxml) Hello world eoxml text_node = Nokogiri::XML::Text.new('hello', xml) assert_equal Nokogiri::XML::Node::TEXT_NODE, text_node.type xml.root.add_child text_node assert_match 'hello', xml.to_s end def test_chevron_works_as_add_child xml = Nokogiri::XML(<<-eoxml) Hello world eoxml text_node = Nokogiri::XML::Text.new('hello', xml) xml.root << text_node assert_match 'hello', xml.to_s end def test_add_previous_sibling xml = Nokogiri::XML(<<-eoxml) Hello world eoxml b_node = Nokogiri::XML::Node.new('a', xml) assert_equal Nokogiri::XML::Node::ELEMENT_NODE, b_node.type b_node.content = 'first' a_node = xml.xpath('.//a').first a_node.add_previous_sibling(b_node) assert_equal('first', xml.xpath('.//a').first.text) end def test_add_previous_sibling_merge xml = Nokogiri::XML(<<-eoxml) Hello world eoxml assert a_tag = xml.css('a').first left_space = a_tag.previous right_space = a_tag.next assert left_space.text? assert right_space.text? left_space.add_previous_sibling(right_space) assert_equal left_space, right_space end def test_add_next_sibling_merge xml = Nokogiri::XML(<<-eoxml) Hello world eoxml assert a_tag = xml.css('a').first left_space = a_tag.previous right_space = a_tag.next assert left_space.text? assert right_space.text? right_space.add_next_sibling(left_space) assert_equal left_space, right_space end def test_add_next_sibling_to_root_raises_exception xml = Nokogiri::XML(<<-eoxml) eoxml node = Nokogiri::XML::Node.new 'child', xml assert_raise(ArgumentError) do xml.root.add_next_sibling(node) end end def test_add_previous_sibling_to_root_raises_exception xml = Nokogiri::XML(<<-eoxml) eoxml node = Nokogiri::XML::Node.new 'child', xml assert_raise(ArgumentError) do xml.root.add_previous_sibling(node) end end def test_add_pi_as_previous_sibling_to_root_is_ok doc = Nokogiri::XML "foo" pi = Nokogiri::XML::ProcessingInstruction.new(doc, "xml-stylesheet", %q{type="text/xsl" href="foo.xsl"}) doc.root.add_previous_sibling pi expected_doc = %Q{\n\nfoo} assert_includes doc.to_xml, expected_doc end def test_find_by_css_with_tilde_eql xml = Nokogiri::XML.parse(<<-eoxml) Hello world Bar Bar Bar Bar Awesome Awesome eoxml set = xml.css('a[@class~="bar"]') assert_equal 4, set.length assert_equal ['Bar'], set.map { |node| node.content }.uniq end def test_unlink xml = Nokogiri::XML.parse(<<-eoxml) Bar Bar Bar Hello world Bar Awesome Awesome eoxml node = xml.xpath('.//a')[3] assert_equal('Hello world', node.text) assert_match(/Hello world/, xml.to_s) assert node.parent assert node.document assert node.previous_sibling assert node.next_sibling node.unlink assert !node.parent # assert !node.document assert !node.previous_sibling assert !node.next_sibling assert_no_match(/Hello world/, xml.to_s) end def test_next_sibling assert sibling = @node.child.next_sibling assert_equal('employee', sibling.name) end def test_previous_sibling assert sibling = @node.child.next_sibling assert_equal('employee', sibling.name) assert_equal(sibling.previous_sibling, @node.child) end def test_name= @node.name = 'awesome' assert_equal('awesome', @node.name) end def test_child assert child = @node.child assert_equal('text', child.name) end def test_key? assert node = @node.search('.//address').first assert(!node.key?('asdfasdf')) end def test_set_property assert node = @node.search('.//address').first node['foo'] = 'bar' assert_equal('bar', node['foo']) end def test_attributes assert node = @node.search('.//address').first assert_nil(node['asdfasdfasdf']) assert_equal('Yes', node['domestic']) assert node = @node.search('.//address')[2] attr = node.attributes assert_equal 2, attr.size assert_equal 'Yes', attr['domestic'].value assert_equal 'Yes', attr['domestic'].to_s assert_equal 'No', attr['street'].value end def test_path assert set = @node.search('.//employee') assert node = set.first assert_equal('/staff/employee[1]', node.path) end def test_search_by_symbol assert set = @node.search(:employee) assert 5, set.length assert node = @node.at(:employee) assert node.text =~ /EMP0001/ end def test_new_node node = Nokogiri::XML::Node.new('form', @node.document) assert_equal('form', node.name) assert(node.document) end def test_encode_special_chars foo = @node.css('employee').first.encode_special_chars('&') assert_equal '&', foo end def test_content node = Nokogiri::XML::Node.new('form', @node) assert_equal('', node.content) node.content = 'hello world!' assert_equal('hello world!', node.content) end def test_whitespace_nodes doc = Nokogiri::XML.parse("Foo\nBar

      Bazz

      ") children = doc.at('.//root').children.collect{|j| j.to_s} assert_equal "\n", children[1] assert_equal " ", children[3] end def test_replace set = @node.search('.//employee') assert 5, set.length assert 0, @node.search('.//form').length first = set[0] second = set[1] node = Nokogiri::XML::Node.new('form', @node) first.replace(node) assert set = @node.search('.//employee') assert_equal 4, set.length assert 1, @node.search('.//form').length assert_equal set[0].to_xml, second.to_xml end def test_replace_on_unparented_node foo = Node.new('foo', @node.document) if Nokogiri.jruby? # JRuby Nokogiri doesn't raise an exception @node.replace(foo) else assert_raises(RuntimeError){ @node.replace(foo) } end end def test_illegal_replace_of_node_with_doc new_node = Nokogiri::XML.parse('bar') old_node = @node.at('.//employee') assert_raises(ArgumentError){ old_node.replace new_node } end end end end nokogiri-1.6.1/test/xml/test_node_reparenting.rb0000644000175000017500000003635712261213762021424 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestNodeReparenting < Nokogiri::TestCase describe "standard node reparenting behavior" do # describe "namespace handling during reparenting" do # describe "given a Node" do # describe "with a Namespace" do # it "keeps the Namespace" # end # describe "given a parent Node with a default and a non-default Namespace" do # describe "passed an Node without a namespace" do # it "inserts an Node that inherits the default Namespace" # end # describe "passed a Node with a Namespace that matches the parent's non-default Namespace" do # it "inserts a Node that inherits the matching parent Namespace" # end # end # end # describe "given a markup string" do # describe "parsed relative to the document" do # describe "with a Namespace" do # it "keeps the Namespace" # end # describe "given a parent Node with a default and a non-default Namespace" do # describe "passed an Node without a namespace" do # it "inserts an Node that inherits the default Namespace" # end # describe "passed a Node with a Namespace that matches the parent's non-default Namespace" do # it "inserts a Node that inherits the matching parent Namespace" # end # end # end # describe "parsed relative to a specific node" do # describe "with a Namespace" do # it "keeps the Namespace" # end # describe "given a parent Node with a default and a non-default Namespace" do # describe "passed an Node without a namespace" do # it "inserts an Node that inherits the default Namespace" # end # describe "passed a Node with a Namespace that matches the parent's non-default Namespace" do # it "inserts a Node that inherits the matching parent Namespace" # end # end # end # end # end { :add_child => {:target => "/root/a1", :returns_self => false, :children_tags => %w[text b1 b2]}, :<< => {:target => "/root/a1", :returns_self => true, :children_tags => %w[text b1 b2]}, :replace => {:target => "/root/a1/node()", :returns_self => false, :children_tags => %w[b1 b2]}, :swap => {:target => "/root/a1/node()", :returns_self => true, :children_tags => %w[b1 b2]}, :children= => {:target => "/root/a1", :returns_self => false, :children_tags => %w[b1 b2]}, :inner_html= => {:target => "/root/a1", :returns_self => true, :children_tags => %w[b1 b2]}, :add_previous_sibling => {:target => "/root/a1/text()", :returns_self => false, :children_tags => %w[b1 b2 text]}, :previous= => {:target => "/root/a1/text()", :returns_self => false, :children_tags => %w[b1 b2 text]}, :before => {:target => "/root/a1/text()", :returns_self => true, :children_tags => %w[b1 b2 text]}, :add_next_sibling => {:target => "/root/a1/text()", :returns_self => false, :children_tags => %w[text b1 b2]}, :next= => {:target => "/root/a1/text()", :returns_self => false, :children_tags => %w[text b1 b2]}, :after => {:target => "/root/a1/text()", :returns_self => true, :children_tags => %w[text b1 b2]} }.each do |method, params| before do @doc = Nokogiri::XML "First nodeSecond nodeThird node" @doc2 = @doc.dup @fragment_string = "foobar" @fragment = Nokogiri::XML::DocumentFragment.parse @fragment_string @node_set = Nokogiri::XML("foobar").xpath("/root/node()") end describe "##{method}" do describe "passed a Node" do [:current, :another].each do |which| describe "passed a Node in the #{which} document" do before do @other_doc = which == :current ? @doc : @doc2 @other_node = @other_doc.at_xpath("/root/a2") end it "unlinks the Node from its previous position" do @doc.at_xpath(params[:target]).send(method, @other_node) @other_doc.at_xpath("/root/a2").must_be_nil end it "inserts the Node in the proper position" do @doc.at_xpath(params[:target]).send(method, @other_node) @doc.at_xpath("/root/a1/a2").wont_be_nil end it "returns the expected value" do sendee = @doc.at_xpath(params[:target]) result = sendee.send(method, @other_node) if params[:returns_self] result.must_equal sendee else result.must_equal @other_node end end end end end describe "passed a markup string" do it "inserts the fragment roots in the proper position" do @doc.at_xpath(params[:target]).send(method, @fragment_string) @doc.xpath("/root/a1/node()").collect {|n| n.name}.must_equal params[:children_tags] end it "returns the expected value" do sendee = @doc.at_xpath(params[:target]) result = sendee.send(method, @fragment_string) if params[:returns_self] result.must_equal sendee else result.must_be_kind_of Nokogiri::XML::NodeSet result.to_html.must_equal @fragment_string end end end describe "passed a fragment" do it "inserts the fragment roots in the proper position" do @doc.at_xpath(params[:target]).send(method, @fragment) @doc.xpath("/root/a1/node()").collect {|n| n.name}.must_equal params[:children_tags] end end describe "passed a document" do it "raises an exception" do proc { @doc.at_xpath("/root/a1").send(method, @doc2) }.must_raise(ArgumentError) end end describe "passed a non-Node" do it "raises an exception" do proc { @doc.at_xpath("/root/a1").send(method, 42) }.must_raise(ArgumentError) end end describe "passed a NodeSet" do it "inserts each member of the NodeSet in the proper order" do @doc.at_xpath(params[:target]).send(method, @node_set) @doc.xpath("/root/a1/node()").collect {|n| n.name}.must_equal params[:children_tags] end end end end describe "text node merging" do describe "#add_child" do it "merges the Text node with adjacent Text nodes" do @doc.at_xpath("/root/a1").add_child Nokogiri::XML::Text.new('hello', @doc) @doc.at_xpath("/root/a1/text()").content.must_equal "First nodehello" end end describe "#replace" do it "merges the Text node with adjacent Text nodes" do @doc.at_xpath("/root/a3/bx").replace Nokogiri::XML::Text.new('hello', @doc) @doc.at_xpath("/root/a3/text()").content.must_equal "Third hellonode" end end end end describe "ad hoc node reparenting behavior" do describe "#<<" do it "allows chaining" do doc = Nokogiri::XML::Document.new root = Nokogiri::XML::Element.new('root', doc) doc.root = root child1 = Nokogiri::XML::Element.new('child1', doc) child2 = Nokogiri::XML::Element.new('child2', doc) doc.root << child1 << child2 assert_equal [child1, child2], doc.root.children.to_a end end describe "#add_child" do describe "given a new node with a namespace" do it "keeps the namespace" do doc = Nokogiri::XML::Document.new item = Nokogiri::XML::Element.new('item', doc) doc.root = item entry = Nokogiri::XML::Element.new('entry', doc) entry.add_namespace('tlm', 'http://tenderlovemaking.com') assert_equal 'http://tenderlovemaking.com', entry.namespaces['xmlns:tlm'] item.add_child(entry) assert_equal 'http://tenderlovemaking.com', entry.namespaces['xmlns:tlm'] end end describe "given a parent node with a default namespace" do before do @doc = Nokogiri::XML(<<-eoxml) eoxml end it "inserts a node that inherits the default namespace" do assert node = @doc.at('//xmlns:first') child = Nokogiri::XML::Node.new('second', @doc) node.add_child(child) assert @doc.at('//xmlns:second') end end describe "given a parent node with a non-default namespace" do before do @doc = Nokogiri::XML(<<-eoxml) eoxml end describe "and a child node with a namespace matching the parent's non-default namespace" do it "inserts a node that inherits the matching parent namespace" do assert node = @doc.at('//xmlns:first') child = Nokogiri::XML::Node.new('second', @doc) ns = @doc.root.namespace_definitions.detect { |x| x.prefix == "foo" } child.namespace = ns node.add_child(child) assert @doc.at('//foo:second', "foo" => "http://flavorjon.es/") end end end end describe "#add_previous_sibling" do it "should not merge text nodes during the operation" do xml = Nokogiri::XML %Q(text node) replacee = xml.root.children.first replacee.add_previous_sibling "foo

      bar" assert_equal "foo

      bartext node", xml.root.children.to_html end it 'should remove the child node after the operation' do fragment = Nokogiri::HTML::DocumentFragment.parse("ab") node = fragment.children.last node.add_previous_sibling node.children assert_empty node.children, "should have no childrens" end describe "with a text node before" do it "should not defensively dup the 'before' text node" do xml = Nokogiri::XML %Q(before

      after
      ) pivot = xml.at_css("p") before = xml.root.children.first after = xml.root.children.last pivot.add_previous_sibling("x") assert_equal "after", after.content assert !after.parent.nil?, "unrelated node should not be affected" assert_equal "before", before.content assert !before.parent.nil?, "no need to reparent" end end end describe "#add_next_sibling" do it "should not merge text nodes during the operation" do xml = Nokogiri::XML %Q(text node) replacee = xml.root.children.first replacee.add_next_sibling "foo

      bar" assert_equal "text nodefoo

      bar", xml.root.children.to_html end describe "with a text node after" do it "should not defensively dup the 'after' text node" do xml = Nokogiri::XML %Q(before

      after
      ) pivot = xml.at_css("p") before = xml.root.children.first after = xml.root.children.last pivot.add_next_sibling("x") assert_equal "before", before.content assert !before.parent.nil?, "unrelated node should not be affected" assert_equal "after", after.content assert !after.parent.nil? end end end describe "#replace" do describe "a text node with a text node" do it "should not merge text nodes during the operation" do xml = Nokogiri::XML %Q(text node) replacee = xml.root.children.first replacee.replace "new text node" assert_equal "new text node", xml.root.children.first.content end end describe "when a document has a default namespace" do before do @fruits = Nokogiri::XML(<<-eoxml) eoxml end it "inserts a node with default namespaces" do apple = @fruits.css('apple').first orange = Nokogiri::XML::Node.new('orange', @fruits) apple.replace(orange) assert_equal orange, @fruits.css('orange').first end end end describe "unlinking a node and then reparenting it" do it "not blow up" do # see http://github.com/sparklemotion/nokogiri/issues#issue/22 10.times do begin doc = Nokogiri::XML <<-EOHTML EOHTML assert root = doc.at("root") assert a = root.at("a") assert b = a.at("b") assert c = a.at("c") a.add_next_sibling(b.unlink) c.unlink end GC.start end end end describe "replace-merging text nodes" do [ ['a
      ', 'afoo'], ['a
      b
      ', 'afoob'], ['
      b
      ', 'foob'] ].each do |xml, result| it "doesn't blow up on #{xml}" do doc = Nokogiri::XML.parse(xml) saved_nodes = doc.root.children doc.at_xpath("/root/br").replace(Nokogiri::XML::Text.new('foo', doc)) saved_nodes.each { |child| child.inspect } # try to cause a crash assert_equal result, doc.at_xpath("/root/text()").inner_text end end end end end end end nokogiri-1.6.1/test/xml/test_attribute_decl.rb0000644000175000017500000000363212261213762021061 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestAttributeDecl < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(<<-eoxml) ]> eoxml @attrs = @xml.internal_subset.children @attr_decl = @attrs.first end def test_inspect assert_equal( "#<#{@attr_decl.class.name}:#{sprintf("0x%x", @attr_decl.object_id)} #{@attr_decl.to_s.inspect}>", @attr_decl.inspect ) end def test_type assert_equal 16, @attr_decl.type end def test_class assert_instance_of Nokogiri::XML::AttributeDecl, @attr_decl end def test_content assert_raise NoMethodError do @attr_decl.content end end def test_attributes assert_raise NoMethodError do @attr_decl.attributes end end def test_namespace assert_raise NoMethodError do @attr_decl.namespace end end def test_namespace_definitions assert_raise NoMethodError do @attr_decl.namespace_definitions end end def test_line assert_raise NoMethodError do @attr_decl.line end end def test_attribute_type if Nokogiri.uses_libxml? assert_equal 1, @attr_decl.attribute_type else assert_equal 'CDATA', @attr_decl.attribute_type end end def test_default assert_equal '0', @attr_decl.default assert_equal '0', @attrs[1].default end def test_enumeration assert_equal [], @attr_decl.enumeration assert_equal ['check', 'cash'], @attrs[2].enumeration end end end end nokogiri-1.6.1/test/xml/test_dtd_encoding.rb0000644000175000017500000000141712261213762020507 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module XML if RUBY_VERSION =~ /^1\.9/ class TestDTDEncoding < Nokogiri::TestCase def setup super @xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE, 'UTF-8') assert @dtd = @xml.internal_subset end def test_entities @dtd.entities.each do |k,v| assert_equal @xml.encoding, k.encoding.name end end def test_notations @dtd.notations.each do |k,notation| assert_equal 'UTF-8', k.encoding.name %w{ name public_id system_id }.each do |attribute| v = notation.send(:"#{attribute}") || next assert_equal 'UTF-8', v.encoding.name end end end end end end end nokogiri-1.6.1/test/xml/test_syntax_error.rb0000644000175000017500000000036012261213762020621 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestSyntaxError < Nokogiri::TestCase def test_new error = Nokogiri::XML::SyntaxError.new 'hello' assert_equal 'hello', error.message end end end end nokogiri-1.6.1/test/xml/test_relax_ng.rb0000644000175000017500000000316612261213762017670 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestRelaxNG < Nokogiri::TestCase def setup assert @schema = Nokogiri::XML::RelaxNG(File.read(ADDRESS_SCHEMA_FILE)) end def test_parse_with_memory assert_instance_of Nokogiri::XML::RelaxNG, @schema assert_equal 0, @schema.errors.length end def test_new assert schema = Nokogiri::XML::RelaxNG.new( File.read(ADDRESS_SCHEMA_FILE)) assert_instance_of Nokogiri::XML::RelaxNG, schema end def test_parse_with_io xsd = nil File.open(ADDRESS_SCHEMA_FILE, 'rb') { |f| assert xsd = Nokogiri::XML::RelaxNG(f) } assert_equal 0, xsd.errors.length end def test_parse_with_errors xml = File.read(ADDRESS_SCHEMA_FILE).sub(/name="/, 'name=') assert_raises(Nokogiri::XML::SyntaxError) { Nokogiri::XML::RelaxNG(xml) } end def test_validate_document doc = Nokogiri::XML(File.read(ADDRESS_XML_FILE)) assert errors = @schema.validate(doc) assert_equal 0, errors.length end def test_validate_invalid_document # Empty address book is not allowed read_doc = '' assert errors = @schema.validate(Nokogiri::XML(read_doc)) assert_equal 1, errors.length end def test_valid? valid_doc = Nokogiri::XML(File.read(ADDRESS_XML_FILE)) invalid_doc = Nokogiri::XML('') assert(@schema.valid?(valid_doc)) assert(!@schema.valid?(invalid_doc)) end end end end nokogiri-1.6.1/test/xml/test_text.rb0000644000175000017500000000244212261213762017051 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestText < Nokogiri::TestCase def test_css_path doc = Nokogiri.XML " foo something bar bazz " node = doc.root.children[2] assert_instance_of Nokogiri::XML::Text, node assert_equal node, doc.at_css(node.css_path) end def test_inspect node = Text.new('hello world', Document.new) assert_equal "#<#{node.class.name}:#{sprintf("0x%x",node.object_id)} #{node.text.inspect}>", node.inspect end def test_new node = Text.new('hello world', Document.new) assert node assert_equal('hello world', node.content) assert_instance_of Nokogiri::XML::Text, node end def test_lots_of_text 100.times { Text.new('hello world', Document.new) } end def test_new_without_document doc = Document.new node = Nokogiri::XML::Element.new('foo', doc) assert Text.new('hello world', node) end def test_content= node = Text.new('foo', Document.new) assert_equal('foo', node.content) node.content = '& &' assert_equal('& &', node.content) assert_equal('& <foo> &amp;', node.to_xml) end end end end nokogiri-1.6.1/test/xml/test_node_set.rb0000644000175000017500000005575512261213762017704 0ustar boutilboutilrequire "helper" module Nokogiri module XML class TestNodeSet < Nokogiri::TestCase class TestNodeSetNamespaces < Nokogiri::TestCase def setup super @xml = Nokogiri.XML('') @list = @xml.xpath('//namespace::*') end def test_include? assert @list.include?(@list.first), 'list should have item' end def test_push @list.push @list.first end def test_delete @list.push @list.first @list.delete @list.first end def test_reference_after_delete first = @list.first @list.delete(first) assert_equal 'http://www.w3.org/XML/1998/namespace', first.href end end def setup super @xml = Nokogiri::XML(File.read(XML_FILE), XML_FILE) @list = @xml.css('employee') end def test_break_works assert_equal 7, @xml.root.elements.each { |x| break 7 } end def test_filter list = @xml.css('address').filter('*[domestic="Yes"]') assert_equal(%w{ Yes } * 4, list.map { |n| n['domestic'] }) end def test_remove_attr @list.each { |x| x['class'] = 'blah' } assert_equal @list, @list.remove_attr('class') @list.each { |x| assert_nil x['class'] } end def test_add_class assert_equal @list, @list.add_class('bar') @list.each { |x| assert_equal 'bar', x['class'] } @list.add_class('bar') @list.each { |x| assert_equal 'bar', x['class'] } @list.add_class('baz') @list.each { |x| assert_equal 'bar baz', x['class'] } end def test_remove_class_with_no_class assert_equal @list, @list.remove_class('bar') @list.each { |e| assert_nil e['class'] } @list.each { |e| e['class'] = '' } assert_equal @list, @list.remove_class('bar') @list.each { |e| assert_nil e['class'] } end def test_remove_class_single @list.each { |e| e['class'] = 'foo bar' } assert_equal @list, @list.remove_class('bar') @list.each { |e| assert_equal 'foo', e['class'] } end def test_remove_class_completely @list.each { |e| e['class'] = 'foo' } assert_equal @list, @list.remove_class @list.each { |e| assert_nil e['class'] } end def test_attribute_set @list.each { |e| assert_nil e['foo'] } [ ['attribute', 'bar'], ['attr', 'biz'], ['set', 'baz'] ].each do |t| @list.send(t.first.to_sym, 'foo', t.last) @list.each { |e| assert_equal t.last, e['foo'] } end end def test_attribute_set_with_block @list.each { |e| assert_nil e['foo'] } [ ['attribute', 'bar'], ['attr', 'biz'], ['set', 'baz'] ].each do |t| @list.send(t.first.to_sym, 'foo') { |x| t.last } @list.each { |e| assert_equal t.last, e['foo'] } end end def test_attribute_set_with_hash @list.each { |e| assert_nil e['foo'] } [ ['attribute', 'bar'], ['attr', 'biz'], ['set', 'baz'] ].each do |t| @list.send(t.first.to_sym, 'foo' => t.last) @list.each { |e| assert_equal t.last, e['foo'] } end end def test_attribute_no_args @list.first['foo'] = 'bar' assert_equal @list.first.attribute('foo'), @list.attribute('foo') end def test_search_empty_node_set set = Nokogiri::XML::NodeSet.new(Nokogiri::XML::Document.new) assert_equal 0, set.css('foo').length assert_equal 0, set.xpath('.//foo').length assert_equal 0, set.search('foo').length end def test_xpath_with_custom_object set = @xml.xpath('//staff') custom_employees = set.xpath('//*[awesome(.)]', Class.new { def awesome ns ns.select { |n| n.name == 'employee' } end }.new) assert_equal @xml.xpath('//employee'), custom_employees end def test_css_with_custom_object set = @xml.xpath('//staff') custom_employees = set.css('*:awesome', Class.new { def awesome ns ns.select { |n| n.name == 'employee' } end }.new) assert_equal @xml.xpath('//employee'), custom_employees end def test_search_self set = @xml.xpath('//staff') assert_equal set.to_a, set.search('.').to_a end def test_search_with_custom_object set = @xml.xpath('//staff') custom_employees = set.search('//*[awesome(.)]', Class.new { def awesome ns ns.select { |n| n.name == 'employee' } end }.new) assert_equal @xml.xpath('//employee'), custom_employees end def test_css_searches_match_self html = Nokogiri::HTML("
      ") set = html.xpath("/html/body/div") assert_equal set.first, set.css(".a").first end def test_search_with_css_matches_self html = Nokogiri::HTML("
      ") set = html.xpath("/html/body/div") assert_equal set.first, set.search(".a").first end def test_css_search_with_namespace fragment = Nokogiri::XML.fragment(<<-eoxml) eoxml assert fragment.children.search( 'body', { 'xmlns' => 'http://www.w3.org/1999/xhtml' }) end def test_double_equal assert node_set_one = @xml.xpath('//employee') assert node_set_two = @xml.xpath('//employee') assert_not_equal node_set_one.object_id, node_set_two.object_id assert_equal node_set_one, node_set_two end def test_node_set_not_equal_to_string node_set_one = @xml.xpath('//employee') assert_not_equal node_set_one, "asdfadsf" end def test_out_of_order_not_equal one = @xml.xpath('//employee') two = @xml.xpath('//employee') two.push two.shift assert_not_equal one, two end def test_shorter_is_not_equal node_set_one = @xml.xpath('//employee') node_set_two = @xml.xpath('//employee') node_set_two.delete(node_set_two.first) assert_not_equal node_set_one, node_set_two end def test_pop set = @xml.xpath('//employee') last = set.last assert_equal last, set.pop end def test_shift set = @xml.xpath('//employee') first = set.first assert_equal first, set.shift end def test_shift_empty set = Nokogiri::XML::NodeSet.new(@xml) assert_nil set.shift end def test_pop_empty set = Nokogiri::XML::NodeSet.new(@xml) assert_nil set.pop end def test_first_takes_arguments assert node_set = @xml.xpath('//employee') assert_equal 2, node_set.first(2).length end def test_dup assert node_set = @xml.xpath('//employee') dup = node_set.dup assert_equal node_set.length, dup.length node_set.zip(dup).each do |a,b| assert_equal a, b end end def test_dup_on_empty_set empty_set = Nokogiri::XML::NodeSet.new @xml, [] assert_equal 0, empty_set.dup.length # this shouldn't raise null pointer exception end def test_xmlns_is_automatically_registered doc = Nokogiri::XML(<<-eoxml) eoxml set = doc.css('foo') assert_equal 1, set.css('xmlns|bar').length assert_equal 0, set.css('|bar').length assert_equal 1, set.xpath('//xmlns:bar').length assert_equal 1, set.search('xmlns|bar').length assert_equal 1, set.search('//xmlns:bar').length assert set.at('//xmlns:bar') assert set.at('xmlns|bar') assert set.at('bar') end def test_children_has_document set = @xml.root.children assert_instance_of(NodeSet, set) assert_equal @xml, set.document end def test_length_size assert node_set = @xml.search('//employee') assert_equal node_set.length, node_set.size end def test_to_xml assert node_set = @xml.search('//employee') assert node_set.to_xml end def test_inner_html doc = Nokogiri::HTML(<<-eohtml)
      one
      two
      eohtml assert html = doc.css('div').inner_html assert_match '', html end def test_gt_string_arg assert node_set = @xml.search('//employee') assert_equal node_set.xpath('./employeeId'), (node_set > 'employeeId') end def test_at assert node_set = @xml.search('//employee') assert_equal node_set.first, node_set.at(0) end def test_at_xpath assert node_set = @xml.search('//employee') assert_equal node_set.first.first_element_child, node_set.at_xpath('./employeeId') end def test_at_css assert node_set = @xml.search('//employee') assert_equal node_set.first.first_element_child, node_set.at_css('employeeId') end def test_percent assert node_set = @xml.search('//employee') assert_equal node_set.first, node_set % 0 end def test_to_ary assert node_set = @xml.search('//employee') foo = [] foo += node_set assert_equal node_set.length, foo.length end def test_push node = Nokogiri::XML::Node.new('foo', @xml) node.content = 'bar' assert node_set = @xml.search('//employee') node_set.push(node) assert node_set.include?(node) end def test_delete_with_invalid_argument employees = @xml.search("//employee") positions = @xml.search("//position") assert_raises(ArgumentError) { employees.delete(positions) } end def test_delete_when_present employees = @xml.search("//employee") wally = employees.first assert employees.include?(wally) # testing setup length = employees.length result = employees.delete(wally) assert_equal result, wally assert ! employees.include?(wally) assert length-1, employees.length end def test_delete_when_not_present employees = @xml.search("//employee") phb = @xml.search("//position").first assert ! employees.include?(phb) # testing setup length = employees.length result = employees.delete(phb) assert_nil result assert length, employees.length end def test_delete_on_empty_set empty_set = Nokogiri::XML::NodeSet.new @xml, [] employee = @xml.at_xpath("//employee") assert_equal nil, empty_set.delete(employee) end def test_unlink xml = Nokogiri::XML.parse(<<-eoxml) Bar Bar Bar Hello world Bar Awesome Awesome
      eoxml set = xml.xpath('//a') set.unlink set.each do |node| assert !node.parent #assert !node.document assert !node.previous_sibling assert !node.next_sibling end assert_no_match(/Hello world/, xml.to_s) end def test_nodeset_search_takes_namespace @xml = Nokogiri::XML.parse(<<-eoxml) Michelin Model XGV I'm a bicycle tire! eoxml set = @xml/'root' assert_equal 1, set.length bike_tire = set.search('//bike:tire', 'bike' => "http://schwinn.com/") assert_equal 1, bike_tire.length end def test_new_nodeset node_set = Nokogiri::XML::NodeSet.new(@xml) assert_equal(0, node_set.length) node = Nokogiri::XML::Node.new('form', @xml) node_set << node assert_equal(1, node_set.length) assert_equal(node, node_set.last) end def test_search_on_nodeset assert node_set = @xml.search('//employee') assert sub_set = node_set.search('.//name') assert_equal(node_set.length, sub_set.length) end def test_negative_index_works assert node_set = @xml.search('//employee') assert_equal node_set.last, node_set[-1] end def test_large_negative_index_returns_nil assert node_set = @xml.search('//employee') assert_nil(node_set[-1 * (node_set.length + 1)]) end def test_node_set_fetches_private_data assert node_set = @xml.search('//employee') set = node_set assert_equal(set[0], set[0]) end def test_node_set_returns_0 assert node_set = @xml.search('//asdkfjhasdlkfjhaldskfh') assert_equal(0, node_set.length) end def test_wrap employees = (@xml/"//employee").wrap("") assert_equal 'wrapper', employees[0].parent.name assert_equal 'employee', @xml.search("//wrapper").first.children[0].name end def test_wrap_a_fragment frag = Nokogiri::XML::DocumentFragment.parse <<-EOXML hello goodbye EOXML employees = frag.xpath ".//employee" employees.wrap("") assert_equal 'wrapper', employees[0].parent.name assert_equal 'employee', frag.at(".//wrapper").children.first.name end def test_wrap_preserves_document_structure assert_equal "employeeId", @xml.at_xpath("//employee").children.detect{|j| ! j.text? }.name @xml.xpath("//employeeId[text()='EMP0001']").wrap("") assert_equal "wrapper", @xml.at_xpath("//employee").children.detect{|j| ! j.text? }.name end def test_plus_operator names = @xml.search("name") positions = @xml.search("position") names_len = names.length positions_len = positions.length assert_raises(ArgumentError) { names + positions.first } result = names + positions assert_equal names_len, names.length assert_equal positions_len, positions.length assert_equal names.length + positions.length, result.length names += positions assert_equal result.length, names.length end def test_union names = @xml.search("name") assert_equal(names.length, (names | @xml.search("name")).length) end def test_minus_operator employees = @xml.search("//employee") females = @xml.search("//employee[gender[text()='Female']]") employees_len = employees.length females_len = females.length assert_raises(ArgumentError) { employees - females.first } result = employees - females assert_equal employees_len, employees.length assert_equal females_len, females.length assert_equal employees.length - females.length, result.length employees -= females assert_equal result.length, employees.length end def test_array_index employees = @xml.search("//employee") other = @xml.search("//position").first assert_equal 3, employees.index(employees[3]) assert_nil employees.index(other) end def test_slice_too_far employees = @xml.search("//employee") assert_equal employees.length, employees[0, employees.length + 1].length assert_equal employees.length, employees[0, employees.length].length end def test_slice_on_empty_node_set empty_set = Nokogiri::XML::NodeSet.new @xml, [] assert_equal nil, empty_set[99] assert_equal nil, empty_set[99..101] assert_equal nil, empty_set[99,2] end def test_slice_waaaaaay_off_the_end xml = Nokogiri::XML::Builder.new { root { 100.times { div } } }.doc nodes = xml.css "div" assert_equal 1, nodes.slice(99, 100_000).length assert_equal 0, nodes.slice(100, 100_000).length end def test_array_slice_with_start_and_end employees = @xml.search("//employee") assert_equal [employees[1], employees[2], employees[3]], employees[1,3].to_a end def test_array_index_bracket_equivalence employees = @xml.search("//employee") assert_equal [employees[1], employees[2], employees[3]], employees[1,3].to_a assert_equal [employees[1], employees[2], employees[3]], employees.slice(1,3).to_a end def test_array_slice_with_negative_start employees = @xml.search("//employee") assert_equal [employees[2]], employees[-3,1].to_a assert_equal [employees[2], employees[3]], employees[-3,2].to_a end def test_array_slice_with_invalid_args employees = @xml.search("//employee") assert_nil employees[99, 1] # large start assert_nil employees[1, -1] # negative len assert_equal [], employees[1, 0].to_a # zero len end def test_array_slice_with_range employees = @xml.search("//employee") assert_equal [employees[1], employees[2], employees[3]], employees[1..3].to_a assert_equal [employees[0], employees[1], employees[2], employees[3]], employees[0..3].to_a end def test_intersection_with_no_overlap employees = @xml.search("//employee") positions = @xml.search("//position") assert_equal [], (employees & positions).to_a end def test_intersection employees = @xml.search("//employee") first_set = employees[0..2] second_set = employees[2..4] assert_equal [employees[2]], (first_set & second_set).to_a end def test_intersection_on_empty_set empty_set = Nokogiri::XML::NodeSet.new @xml employees = @xml.search("//employee") assert_equal 0, (empty_set & employees).length end def test_include? employees = @xml.search("//employee") yes = employees.first no = @xml.search("//position").first assert employees.include?(yes) assert ! employees.include?(no) end def test_include_on_empty_node_set empty_set = Nokogiri::XML::NodeSet.new @xml, [] employee = @xml.at_xpath("//employee") assert ! empty_set.include?(employee) end def test_children employees = @xml.search("//employee") count = 0 employees.each do |employee| count += employee.children.length end set = employees.children assert_equal count, set.length end def test_inspect employees = @xml.search("//employee") inspected = employees.inspect assert_equal "[#{employees.map { |x| x.inspect }.join(', ')}]", inspected end def test_should_not_splode_when_accessing_namespace_declarations_in_a_node_set 2.times do xml = Nokogiri::XML "" node_set = xml.xpath("//namespace::*") assert_equal 1, node_set.size node = node_set.first node.to_s # segfaults in 1.4.0 and earlier # if we haven't segfaulted, let's make sure we handled it correctly assert_instance_of Nokogiri::XML::Namespace, node end end def test_should_not_splode_when_arrayifying_node_set_containing_namespace_declarations xml = Nokogiri::XML "" node_set = xml.xpath("//namespace::*") assert_equal 1, node_set.size node_array = node_set.to_a node = node_array.first node.to_s # segfaults in 1.4.0 and earlier # if we haven't segfaulted, let's make sure we handled it correctly assert_instance_of Nokogiri::XML::Namespace, node end def test_should_not_splode_when_unlinking_node_set_containing_namespace_declarations xml = Nokogiri::XML "" node_set = xml.xpath("//namespace::*") assert_equal 1, node_set.size node_set.unlink end def test_reverse xml = Nokogiri::XML "bd" children = xml.root.children assert_instance_of Nokogiri::XML::NodeSet, children reversed = children.reverse assert_equal reversed[0], children[4] assert_equal reversed[1], children[3] assert_equal reversed[2], children[2] assert_equal reversed[3], children[1] assert_equal reversed[4], children[0] assert_equal children, children.reverse.reverse end def test_node_set_dup_result_has_document_and_is_decorated x = Module.new do def awesome! ; end end util_decorate(@xml, x) node_set = @xml.css("address") new_set = node_set.dup assert_equal node_set.document, new_set.document assert new_set.respond_to?(:awesome!) end def test_node_set_union_result_has_document_and_is_decorated x = Module.new do def awesome! ; end end util_decorate(@xml, x) node_set1 = @xml.css("address") node_set2 = @xml.css("address") new_set = node_set1 | node_set2 assert_equal node_set1.document, new_set.document assert new_set.respond_to?(:awesome!) end def test_node_set_intersection_result_has_document_and_is_decorated x = Module.new do def awesome! ; end end util_decorate(@xml, x) node_set1 = @xml.css("address") node_set2 = @xml.css("address") new_set = node_set1 & node_set2 assert_equal node_set1.document, new_set.document assert new_set.respond_to?(:awesome!) end def test_node_set_difference_result_has_document_and_is_decorated x = Module.new do def awesome! ; end end util_decorate(@xml, x) node_set1 = @xml.css("address") node_set2 = @xml.css("address") new_set = node_set1 - node_set2 assert_equal node_set1.document, new_set.document assert new_set.respond_to?(:awesome!) end def test_node_set_slice_result_has_document_and_is_decorated x = Module.new do def awesome! ; end end util_decorate(@xml, x) node_set = @xml.css("address") new_set = node_set[0..-1] assert_equal node_set.document, new_set.document assert new_set.respond_to?(:awesome!) end end end end nokogiri-1.6.1/test/xml/test_node_inheritance.rb0000644000175000017500000000130612261213762021361 0ustar boutilboutil# issue#560 require 'helper' module Nokogiri module XML class TestNodeInheritance < Nokogiri::TestCase MyNode = Class.new Nokogiri::XML::Node def setup super @node = MyNode.new 'foo', Nokogiri::XML::Document.new @node['foo'] = 'bar' end def test_node_name assert @node.name == 'foo' end def test_node_writing_an_attribute_accessing_via_attributes assert @node.attributes['foo'] end def test_node_writing_an_attribute_accessing_via_key assert @node.key? 'foo' end def test_node_writing_an_attribute_accessing_via_brackets assert @node['foo'] == 'bar' end end end end nokogiri-1.6.1/test/decorators/0000755000175000017500000000000012261213762016044 5ustar boutilboutilnokogiri-1.6.1/test/decorators/test_slop.rb0000644000175000017500000000057012261213762020407 0ustar boutilboutilrequire "helper" module Nokogiri class TestSlop < Nokogiri::TestCase def test_description_tag doc = Nokogiri.Slop(<<-eoxml) foo this is the foo thing eoxml assert doc.item.title assert doc.item._description, 'should have description' end end end nokogiri-1.6.1/test/css/0000755000175000017500000000000012261213762014467 5ustar boutilboutilnokogiri-1.6.1/test/css/test_nthiness.rb0000644000175000017500000001052612261213762017712 0ustar boutilboutilrequire "helper" module Nokogiri module CSS class TestNthiness < Nokogiri::TestCase def setup super doc = <
      row1
      row2
      row3
      row4
      row5
      row6
      row7
      row8
      row9
      row10
      row11
      row12
      row13
      row14
      bold1 italic1 bold2 italic2

      para1

      bold3

      para2

      para3

      para4

      EOF @parser = Nokogiri.HTML doc end def test_even assert_result_rows [2,4,6,8,10,12,14], @parser.search("table/tr:nth(even)") end def test_odd assert_result_rows [1,3,5,7,9,11,13], @parser.search("table/tr:nth(odd)") end def test_2n assert_equal @parser.search("table/tr:nth(even)").inner_text, @parser.search("table/tr:nth(2n)").inner_text end def test_2np1 assert_equal @parser.search("table/tr:nth(odd)").inner_text, @parser.search("table/tr:nth(2n+1)").inner_text end def test_4np3 assert_result_rows [3,7,11], @parser.search("table/tr:nth(4n+3)") end def test_3np4 assert_result_rows [4,7,10,13], @parser.search("table/tr:nth(3n+4)") end def test_mnp3 assert_result_rows [1,2,3], @parser.search("table/tr:nth(-n+3)") end def test_np3 assert_result_rows [3,4,5,6,7,8,9,10,11,12,13,14], @parser.search("table/tr:nth(n+3)") end def test_first assert_result_rows [1], @parser.search("table/tr:first") assert_result_rows [1], @parser.search("table/tr:first()") end def test_last assert_result_rows [14], @parser.search("table/tr:last") assert_result_rows [14], @parser.search("table/tr:last()") end def test_first_child assert_result_rows [1], @parser.search("div/b:first-child"), "bold" assert_result_rows [1], @parser.search("table/tr:first-child") end def test_last_child assert_result_rows [3], @parser.search("div/b:last-child"), "bold" assert_result_rows [14], @parser.search("table/tr:last-child") end def test_first_of_type assert_result_rows [1], @parser.search("table/tr:first-of-type") assert_result_rows [1], @parser.search("div/b:first-of-type"), "bold" end def test_last_of_type assert_result_rows [14], @parser.search("table/tr:last-of-type") assert_result_rows [3], @parser.search("div/b:last-of-type"), "bold" end def test_only_of_type assert_result_rows [1,4], @parser.search("div/p:only-of-type"), "para" end def test_only_child assert_result_rows [4], @parser.search("div/p:only-child"), "para" end def test_empty result = @parser.search("p:empty") assert_equal 1, result.size, "unexpected number of rows returned: '#{result.inner_text}'" assert_equal 'empty', result.first['class'] end def test_parent result = @parser.search("p:parent") assert_equal 5, result.size 0.upto(3) do |j| assert_equal "para#{j+1} ", result[j].inner_text end assert_equal "not-empty", result[4]['class'] end def test_siblings doc = <<-EOF

      p1

      p2

      p3

      p4

      p5

      EOF parser = Nokogiri.HTML doc assert_equal 2, parser.search("#3 ~ p").size assert_equal "p4 p5 ", parser.search("#3 ~ p").inner_text assert_equal 0, parser.search("#5 ~ p").size assert_equal 1, parser.search("#3 + p").size assert_equal "p4 ", parser.search("#3 + p").inner_text assert_equal 0, parser.search("#5 + p").size end def assert_result_rows intarray, result, word="row" assert_equal intarray.size, result.size, "unexpected number of rows returned: '#{result.inner_text}'" assert_equal intarray.map{|j| "#{word}#{j}"}.join(' '), result.inner_text.strip, result.inner_text end end end end nokogiri-1.6.1/test/css/test_xpath_visitor.rb0000644000175000017500000000556112261213762020765 0ustar boutilboutilrequire "helper" module Nokogiri module CSS class TestXPathVisitor < Nokogiri::TestCase def setup super @parser = Nokogiri::CSS::Parser.new end def test_not_simple_selector assert_xpath('//ol/*[not(self::li)]', @parser.parse('ol > *:not(li)')) end def test_not_last_child assert_xpath('//ol/*[not(position() = last())]', @parser.parse('ol > *:not(:last-child)')) end def test_function_calls_allow_at_params assert_xpath("//a[foo(., @href)]", @parser.parse('a:foo(@href)')) assert_xpath("//a[foo(., @a, b)]", @parser.parse('a:foo(@a, b)')) assert_xpath("//a[foo(., a, 10)]", @parser.parse('a:foo(a, 10)')) end def test_namespace_conversion assert_xpath("//aaron:a", @parser.parse('aaron|a')) assert_xpath("//a", @parser.parse('|a')) end def test_namespaced_attribute_conversion assert_xpath("//a[@flavorjones:href]", @parser.parse('a[flavorjones|href]')) assert_xpath("//a[@href]", @parser.parse('a[|href]')) assert_xpath("//*[@flavorjones:href]", @parser.parse('*[flavorjones|href]')) end def test_unknown_psuedo_classes_get_pushed_down assert_xpath("//a[aaron(.)]", @parser.parse('a:aaron')) end def test_unknown_functions_get_dot_plus_args assert_xpath("//a[aaron(.)]", @parser.parse('a:aaron()')) assert_xpath("//a[aaron(., 12)]", @parser.parse('a:aaron(12)')) assert_xpath("//a[aaron(., 12, 1)]", @parser.parse('a:aaron(12, 1)')) end def test_class_selectors assert_xpath "//*[contains(concat(' ', normalize-space(@class), ' '), ' red ')]", @parser.parse(".red") end def test_pipe assert_xpath "//a[@id = 'Boing' or starts-with(@id, concat('Boing', '-'))]", @parser.parse("a[id|='Boing']") end def test_custom_functions visitor = Class.new(XPathVisitor) do attr_accessor :awesome def visit_function_aaron node @awesome = true 'aaron() = 1' end end.new ast = @parser.parse('a:aaron()').first assert_equal 'a[aaron() = 1]', visitor.accept(ast) assert visitor.awesome end def test_custom_psuedo_classes visitor = Class.new(XPathVisitor) do attr_accessor :awesome def visit_pseudo_class_aaron node @awesome = true 'aaron() = 1' end end.new ast = @parser.parse('a:aaron').first assert_equal 'a[aaron() = 1]', visitor.accept(ast) assert visitor.awesome end def assert_xpath expecteds, asts expecteds = [expecteds].flatten expecteds.zip(asts).each do |expected, actual| assert_equal expected, actual.to_xpath end end end end end nokogiri-1.6.1/test/css/test_tokenizer.rb0000644000175000017500000001265512261213762020076 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module CSS class Tokenizer alias :scan :scan_setup end end end module Nokogiri module CSS class TestTokenizer < Nokogiri::TestCase def setup super @scanner = Nokogiri::CSS::Tokenizer.new end def test_has @scanner.scan("a:has(b)") assert_tokens( [[:IDENT, "a"], [":", ":"], [:HAS, "has("], [:IDENT, "b"], [:RPAREN, ")"]], @scanner) end def test_unicode @scanner.scan("a日本語") assert_tokens([[:IDENT, 'a日本語']], @scanner) end def test_tokenize_bad_single_quote @scanner.scan("'") assert_tokens([["'", "'"]], @scanner) end def test_not_equal @scanner.scan("h1[a!='Tender Lovemaking']") assert_tokens([ [:IDENT, 'h1'], [:LSQUARE, '['], [:IDENT, 'a'], [:NOT_EQUAL, '!='], [:STRING, "'Tender Lovemaking'"], [:RSQUARE, ']'], ], @scanner) end def test_negation @scanner.scan("p:not(.a)") assert_tokens([ [:IDENT, 'p'], [:NOT, ':not('], ['.', '.'], [:IDENT, 'a'], [:RPAREN, ')'], ], @scanner) end def test_function @scanner.scan("script comment()") assert_tokens([ [:IDENT, 'script'], [:S, ' '], [:FUNCTION, 'comment('], [:RPAREN, ')'], ], @scanner) end def test_preceding_selector @scanner.scan("E ~ F") assert_tokens([ [:IDENT, 'E'], [:TILDE, ' ~ '], [:IDENT, 'F'], ], @scanner) end def test_scan_attribute_string @scanner.scan("h1[a='Tender Lovemaking']") assert_tokens([ [:IDENT, 'h1'], [:LSQUARE, '['], [:IDENT, 'a'], [:EQUAL, '='], [:STRING, "'Tender Lovemaking'"], [:RSQUARE, ']'], ], @scanner) @scanner.scan('h1[a="Tender Lovemaking"]') assert_tokens([ [:IDENT, 'h1'], [:LSQUARE, '['], [:IDENT, 'a'], [:EQUAL, '='], [:STRING, '"Tender Lovemaking"'], [:RSQUARE, ']'], ], @scanner) end def test_scan_id @scanner.scan('#foo') assert_tokens([ [:HASH, '#foo'] ], @scanner) end def test_scan_pseudo @scanner.scan('a:visited') assert_tokens([ [:IDENT, 'a'], [':', ':'], [:IDENT, 'visited'] ], @scanner) end def test_scan_star @scanner.scan('*') assert_tokens([ ['*', '*'], ], @scanner) end def test_scan_class @scanner.scan('x.awesome') assert_tokens([ [:IDENT, 'x'], ['.', '.'], [:IDENT, 'awesome'], ], @scanner) end def test_scan_greater @scanner.scan('x > y') assert_tokens([ [:IDENT, 'x'], [:GREATER, ' > '], [:IDENT, 'y'] ], @scanner) end def test_scan_slash @scanner.scan('x/y') assert_tokens([ [:IDENT, 'x'], [:SLASH, '/'], [:IDENT, 'y'] ], @scanner) end def test_scan_doubleslash @scanner.scan('x//y') assert_tokens([ [:IDENT, 'x'], [:DOUBLESLASH, '//'], [:IDENT, 'y'] ], @scanner) end def test_scan_function_selector @scanner.scan('x:eq(0)') assert_tokens([ [:IDENT, 'x'], [':', ':'], [:FUNCTION, 'eq('], [:NUMBER, "0"], [:RPAREN, ')'], ], @scanner) end def test_scan_an_plus_b @scanner.scan('x:nth-child(5n+3)') assert_tokens([ [:IDENT, 'x'], [':', ':'], [:FUNCTION, 'nth-child('], [:NUMBER, '5'], [:IDENT, 'n'], [:PLUS, '+'], [:NUMBER, '3'], [:RPAREN, ')'], ], @scanner) @scanner.scan('x:nth-child(-1n+3)') assert_tokens([ [:IDENT, 'x'], [':', ':'], [:FUNCTION, 'nth-child('], [:NUMBER, '-1'], [:IDENT, 'n'], [:PLUS, '+'], [:NUMBER, '3'], [:RPAREN, ')'], ], @scanner) @scanner.scan('x:nth-child(-n+3)') assert_tokens([ [:IDENT, 'x'], [':', ':'], [:FUNCTION, 'nth-child('], [:IDENT, '-n'], [:PLUS, '+'], [:NUMBER, '3'], [:RPAREN, ')'], ], @scanner) end def assert_tokens(tokens, scanner) toks = [] while tok = @scanner.next_token toks << tok end assert_equal(tokens, toks) end end end end nokogiri-1.6.1/test/css/test_parser.rb0000644000175000017500000003502012261213762017347 0ustar boutilboutilrequire "helper" module Nokogiri module CSS class TestParser < Nokogiri::TestCase def setup super @parser = Nokogiri::CSS::Parser.new @parser_with_ns = Nokogiri::CSS::Parser.new({ "xmlns" => "http://default.example.com/", "hoge" => "http://hoge.example.com/", }) end def test_extra_single_quote assert_raises(CSS::SyntaxError) { @parser.parse("'") } end def test_syntax_error_raised assert_raises(CSS::SyntaxError) { @parser.parse("a[x=]") } end def test_function_and_pseudo assert_xpath '//child::text()[position() = 99]', @parser.parse('text():nth-of-type(99)') end def test_find_by_type ast = @parser.parse("a:nth-child(2)").first matches = ast.find_by_type( [:CONDITIONAL_SELECTOR, [:ELEMENT_NAME], [:PSEUDO_CLASS, [:FUNCTION] ] ] ) assert_equal(1, matches.length) assert_equal(ast, matches.first) end def test_to_type ast = @parser.parse("a:nth-child(2)").first assert_equal( [:CONDITIONAL_SELECTOR, [:ELEMENT_NAME], [:PSEUDO_CLASS, [:FUNCTION] ] ], ast.to_type ) end def test_to_a asts = @parser.parse("a:nth-child(2)") assert_equal( [:CONDITIONAL_SELECTOR, [:ELEMENT_NAME, ["a"]], [:PSEUDO_CLASS, [:FUNCTION, ["nth-child("], ["2"]] ] ], asts.first.to_a ) end def test_has assert_xpath "//a[b]", @parser.parse("a:has(b)") assert_xpath "//a[b/c]", @parser.parse("a:has(b > c)") end def test_dashmatch assert_xpath "//a[@class = 'bar' or starts-with(@class, concat('bar', '-'))]", @parser.parse("a[@class|='bar']") assert_xpath "//a[@class = 'bar' or starts-with(@class, concat('bar', '-'))]", @parser.parse("a[@class |= 'bar']") end def test_includes assert_xpath "//a[contains(concat(\" \", @class, \" \"),concat(\" \", 'bar', \" \"))]", @parser.parse("a[@class~='bar']") assert_xpath "//a[contains(concat(\" \", @class, \" \"),concat(\" \", 'bar', \" \"))]", @parser.parse("a[@class ~= 'bar']") end def test_function_with_arguments assert_xpath "//*[position() = 2 and self::a]", @parser.parse("a[2]") assert_xpath "//*[position() = 2 and self::a]", @parser.parse("a:nth-child(2)") end def test_carrot assert_xpath "//a[starts-with(@id, 'Boing')]", @parser.parse("a[id^='Boing']") assert_xpath "//a[starts-with(@id, 'Boing')]", @parser.parse("a[id ^= 'Boing']") end def test_suffix_match assert_xpath "//a[substring(@id, string-length(@id) - string-length('Boing') + 1, string-length('Boing')) = 'Boing']", @parser.parse("a[id$='Boing']") assert_xpath "//a[substring(@id, string-length(@id) - string-length('Boing') + 1, string-length('Boing')) = 'Boing']", @parser.parse("a[id $= 'Boing']") end def test_attributes_with_at ## This is non standard CSS assert_xpath "//a[@id = 'Boing']", @parser.parse("a[@id='Boing']") assert_xpath "//a[@id = 'Boing']", @parser.parse("a[@id = 'Boing']") end def test_attributes_with_at_and_stuff ## This is non standard CSS assert_xpath "//a[@id = 'Boing']//div", @parser.parse("a[@id='Boing'] div") end def test_not_equal ## This is non standard CSS assert_xpath "//a[child::text() != 'Boing']", @parser.parse("a[text()!='Boing']") assert_xpath "//a[child::text() != 'Boing']", @parser.parse("a[text() != 'Boing']") end def test_function ## This is non standard CSS assert_xpath "//a[child::text()]", @parser.parse("a[text()]") ## This is non standard CSS assert_xpath "//child::text()", @parser.parse("text()") ## This is non standard CSS assert_xpath "//a[contains(child::text(), 'Boing')]", @parser.parse("a[text()*='Boing']") assert_xpath "//a[contains(child::text(), 'Boing')]", @parser.parse("a[text() *= 'Boing']") ## This is non standard CSS assert_xpath "//script//comment()", @parser.parse("script comment()") end def test_nonstandard_nth_selectors ## These are non standard CSS assert_xpath '//a[position() = 1]', @parser.parse('a:first()') assert_xpath '//a[position() = 1]', @parser.parse('a:first') # no parens assert_xpath '//a[position() = 99]', @parser.parse('a:eq(99)') assert_xpath '//a[position() = 99]', @parser.parse('a:nth(99)') assert_xpath '//a[position() = last()]', @parser.parse('a:last()') assert_xpath '//a[position() = last()]', @parser.parse('a:last') # no parens assert_xpath '//a[node()]', @parser.parse('a:parent') end def test_standard_nth_selectors assert_xpath '//a[position() = 1]', @parser.parse('a:first-of-type()') assert_xpath '//a[position() = 1]', @parser.parse('a:first-of-type') # no parens assert_xpath '//a[position() = 99]', @parser.parse('a:nth-of-type(99)') assert_xpath '//a[position() = last()]', @parser.parse('a:last-of-type()') assert_xpath '//a[position() = last()]', @parser.parse('a:last-of-type') # no parens assert_xpath '//a[position() = last()]', @parser.parse('a:nth-last-of-type(1)') assert_xpath '//a[position() = last() - 98]', @parser.parse('a:nth-last-of-type(99)') end def test_nth_child_selectors assert_xpath '//*[position() = 1 and self::a]', @parser.parse('a:first-child') assert_xpath '//*[position() = 99 and self::a]', @parser.parse('a:nth-child(99)') assert_xpath '//*[position() = last() and self::a]', @parser.parse('a:last-child') assert_xpath '//*[position() = last() and self::a]', @parser.parse('a:nth-last-child(1)') assert_xpath '//*[position() = last() - 98 and self::a]', @parser.parse('a:nth-last-child(99)') end def test_miscellaneous_selectors assert_xpath '//*[last() = 1 and self::a]', @parser.parse('a:only-child') assert_xpath '//a[last() = 1]', @parser.parse('a:only-of-type') assert_xpath '//a[not(node())]', @parser.parse('a:empty') end def test_nth_a_n_plus_b assert_xpath '//a[(position() mod 2) = 0]', @parser.parse('a:nth-of-type(2n)') assert_xpath '//a[(position() >= 1) and (((position()-1) mod 2) = 0)]', @parser.parse('a:nth-of-type(2n+1)') assert_xpath '//a[(position() mod 2) = 0]', @parser.parse('a:nth-of-type(even)') assert_xpath '//a[(position() >= 1) and (((position()-1) mod 2) = 0)]', @parser.parse('a:nth-of-type(odd)') assert_xpath '//a[(position() >= 3) and (((position()-3) mod 4) = 0)]', @parser.parse('a:nth-of-type(4n+3)') assert_xpath '//a[(position() <= 3) and (((position()-3) mod 1) = 0)]', @parser.parse('a:nth-of-type(-1n+3)') assert_xpath '//a[(position() <= 3) and (((position()-3) mod 1) = 0)]', @parser.parse('a:nth-of-type(-n+3)') assert_xpath '//a[(position() >= 3) and (((position()-3) mod 1) = 0)]', @parser.parse('a:nth-of-type(1n+3)') assert_xpath '//a[(position() >= 3) and (((position()-3) mod 1) = 0)]', @parser.parse('a:nth-of-type(n+3)') assert_xpath '//a[((last()-position()+1) mod 2) = 0]', @parser.parse('a:nth-last-of-type(2n)') assert_xpath '//a[((last()-position()+1) >= 1) and ((((last()-position()+1)-1) mod 2) = 0)]', @parser.parse('a:nth-last-of-type(2n+1)') assert_xpath '//a[((last()-position()+1) mod 2) = 0]', @parser.parse('a:nth-last-of-type(even)') assert_xpath '//a[((last()-position()+1) >= 1) and ((((last()-position()+1)-1) mod 2) = 0)]', @parser.parse('a:nth-last-of-type(odd)') assert_xpath '//a[((last()-position()+1) >= 3) and ((((last()-position()+1)-3) mod 4) = 0)]', @parser.parse('a:nth-last-of-type(4n+3)') assert_xpath '//a[((last()-position()+1) <= 3) and ((((last()-position()+1)-3) mod 1) = 0)]', @parser.parse('a:nth-last-of-type(-1n+3)') assert_xpath '//a[((last()-position()+1) <= 3) and ((((last()-position()+1)-3) mod 1) = 0)]', @parser.parse('a:nth-last-of-type(-n+3)') assert_xpath '//a[((last()-position()+1) >= 3) and ((((last()-position()+1)-3) mod 1) = 0)]', @parser.parse('a:nth-last-of-type(1n+3)') assert_xpath '//a[((last()-position()+1) >= 3) and ((((last()-position()+1)-3) mod 1) = 0)]', @parser.parse('a:nth-last-of-type(n+3)') end def test_preceding_selector assert_xpath "//E/following-sibling::F", @parser.parse("E ~ F") assert_xpath "//E/following-sibling::F//G", @parser.parse("E ~ F G") end def test_direct_preceding_selector assert_xpath "//E/following-sibling::*[1]/self::F", @parser.parse("E + F") assert_xpath "//E/following-sibling::*[1]/self::F//G", @parser.parse("E + F G") end def test_child_selector assert_xpath("//a//b/i", @parser.parse('a b>i')) assert_xpath("//a//b/i", @parser.parse('a b > i')) assert_xpath("//a/b/i", @parser.parse('a > b > i')) end def test_prefixless_child_selector assert_xpath("./a", @parser.parse('>a')) assert_xpath("./a", @parser.parse('> a')) assert_xpath("./a//b/i", @parser.parse('>a b>i')) assert_xpath("./a/b/i", @parser.parse('> a > b > i')) end def test_prefixless_preceding_sibling_selector assert_xpath("./following-sibling::a", @parser.parse('~a')) assert_xpath("./following-sibling::a", @parser.parse('~ a')) assert_xpath("./following-sibling::a//b/following-sibling::i", @parser.parse('~a b~i')) assert_xpath("./following-sibling::a//b/following-sibling::i", @parser.parse('~ a b ~ i')) end def test_prefixless_direct_adjacent_selector assert_xpath("./following-sibling::*[1]/self::a", @parser.parse('+a')) assert_xpath("./following-sibling::*[1]/self::a", @parser.parse('+ a')) assert_xpath("./following-sibling::*[1]/self::a/following-sibling::*[1]/self::b", @parser.parse('+a+b')) assert_xpath("./following-sibling::*[1]/self::a/following-sibling::*[1]/self::b", @parser.parse('+ a + b')) end def test_attribute assert_xpath "//h1[@a = 'Tender Lovemaking']", @parser.parse("h1[a='Tender Lovemaking']") end def test_id assert_xpath "//*[@id = 'foo']", @parser.parse('#foo') end def test_pseudo_class_no_ident assert_xpath "//*[link(.)]", @parser.parse(':link') end def test_pseudo_class assert_xpath "//a[link(.)]", @parser.parse('a:link') assert_xpath "//a[visited(.)]", @parser.parse('a:visited') assert_xpath "//a[hover(.)]", @parser.parse('a:hover') assert_xpath "//a[active(.)]", @parser.parse('a:active') assert_xpath "//a[active(.) and contains(concat(' ', normalize-space(@class), ' '), ' foo ')]", @parser.parse('a:active.foo') end def test_star assert_xpath "//*", @parser.parse('*') assert_xpath "//*[contains(concat(' ', normalize-space(@class), ' '), ' pastoral ')]", @parser.parse('*.pastoral') end def test_class assert_xpath "//*[contains(concat(' ', normalize-space(@class), ' '), ' a ') and contains(concat(' ', normalize-space(@class), ' '), ' b ')]", @parser.parse('.a.b') assert_xpath "//*[contains(concat(' ', normalize-space(@class), ' '), ' awesome ')]", @parser.parse('.awesome') assert_xpath "//foo[contains(concat(' ', normalize-space(@class), ' '), ' awesome ')]", @parser.parse('foo.awesome') assert_xpath "//foo//*[contains(concat(' ', normalize-space(@class), ' '), ' awesome ')]", @parser.parse('foo .awesome') end def test_not_so_simple_not assert_xpath "//*[@id = 'p' and not(contains(concat(' ', normalize-space(@class), ' '), ' a '))]", @parser.parse('#p:not(.a)') assert_xpath "//p[contains(concat(' ', normalize-space(@class), ' '), ' a ') and not(contains(concat(' ', normalize-space(@class), ' '), ' b '))]", @parser.parse('p.a:not(.b)') assert_xpath "//p[@a = 'foo' and not(contains(concat(' ', normalize-space(@class), ' '), ' b '))]", @parser.parse("p[a='foo']:not(.b)") end def test_ident assert_xpath '//x', @parser.parse('x') end def test_parse_space assert_xpath '//x//y', @parser.parse('x y') end def test_parse_descendant assert_xpath '//x/y', @parser.parse('x > y') end def test_parse_slash ## This is non standard CSS assert_xpath '//x/y', @parser.parse('x/y') end def test_parse_doubleslash ## This is non standard CSS assert_xpath '//x//y', @parser.parse('x//y') end def test_multi_path assert_xpath ['//x/y', '//y/z'], @parser.parse('x > y, y > z') assert_xpath ['//x/y', '//y/z'], @parser.parse('x > y,y > z') ### # TODO: should we make this work? # assert_xpath ['//x/y', '//y/z'], @parser.parse('x > y | y > z') end def test_attributes_with_namespace ## Default namespace is not applied to attributes. ## So this must be @class, not @xmlns:class. assert_xpath "//xmlns:a[@class = 'bar']", @parser_with_ns.parse("a[class='bar']") assert_xpath "//xmlns:a[@hoge:class = 'bar']", @parser_with_ns.parse("a[hoge|class='bar']") end def assert_xpath expecteds, asts expecteds = [expecteds].flatten expecteds.zip(asts).each do |expected, actual| assert_equal expected, actual.to_xpath end end end end end nokogiri-1.6.1/test/test_convert_xpath.rb0000644000175000017500000001171012261213762020147 0ustar boutilboutilrequire "helper" class TestConvertXPath < Nokogiri::TestCase def setup super @N = Nokogiri(File.read(HTML_FILE)) end def assert_syntactical_equivalence(hpath, xpath, match, &blk) blk ||= lambda {|j| j.first} assert_equal match, blk.call(@N.search(xpath)), "xpath result did not match" end def test_child_tag assert_syntactical_equivalence("h1[a]", ".//h1[child::a]", "Tender Lovemaking") do |j| j.inner_text end end def test_child_tag_equals assert_syntactical_equivalence("h1[a='Tender Lovemaking']", ".//h1[child::a = 'Tender Lovemaking']", "Tender Lovemaking") do |j| j.inner_text end end def test_filter_contains assert_syntactical_equivalence("title:contains('Tender')", ".//title[contains(., 'Tender')]", "Tender Lovemaking ") do |j| j.inner_text end end def test_filter_comment assert_syntactical_equivalence("div comment()[2]", ".//div//comment()[position() = 2]", "") do |j| j.first.to_s end end def test_filter_text assert_syntactical_equivalence("a[text()]", ".//a[normalize-space(child::text())]", "
      Tender Lovemaking") do |j| j.first.to_s end assert_syntactical_equivalence("a[text()='Tender Lovemaking']", ".//a[normalize-space(child::text()) = 'Tender Lovemaking']", "Tender Lovemaking") do |j| j.first.to_s end assert_syntactical_equivalence("a/text()", ".//a/child::text()", "Tender Lovemaking") do |j| j.first.to_s end assert_syntactical_equivalence("h2//a[text()!='Back Home!']", ".//h2//a[normalize-space(child::text()) != 'Back Home!']", "Meow meow meow meow meow") do |j| j.first.inner_text end end def test_filter_by_attr assert_syntactical_equivalence("a[@href='http://blog.geminigeek.com/wordpress-theme']", ".//a[@href = 'http://blog.geminigeek.com/wordpress-theme']", "http://blog.geminigeek.com/wordpress-theme") do |j| j.first["href"] end end def test_css_id assert_syntactical_equivalence("#linkcat-7", ".//*[@id = 'linkcat-7']", "linkcat-7") do |j| j.first["id"] end assert_syntactical_equivalence("li#linkcat-7", ".//li[@id = 'linkcat-7']", "linkcat-7") do |j| j.first["id"] end end def test_css_class assert_syntactical_equivalence(".cat-item-15", ".//*[contains(concat(' ', @class, ' '), ' cat-item-15 ')]", "cat-item cat-item-15") do |j| j.first["class"] end assert_syntactical_equivalence("li.cat-item-15", ".//li[contains(concat(' ', @class, ' '), ' cat-item-15 ')]", "cat-item cat-item-15") do |j| j.first["class"] end end def test_css_tags assert_syntactical_equivalence("div li a", ".//div//li//a", "http://brobinius.org/") do |j| j.first.inner_text end assert_syntactical_equivalence("div li > a", ".//div//li/a", "http://brobinius.org/") do |j| j.first.inner_text end assert_syntactical_equivalence("h1 ~ small", ".//small[preceding-sibling::h1]", "The act of making love, tenderly.") do |j| j.first.inner_text end assert_syntactical_equivalence("h1 ~ small", ".//small[preceding-sibling::h1]", "The act of making love, tenderly.") do |j| j.first.inner_text end end def test_positional assert_syntactical_equivalence("div/div:first()", ".//div/div[position() = 1]", "\r\nTender Lovemaking\r\nThe act of making love, tenderly.\r\n".gsub(/[\r\n]/, '')) do |j| j.first.inner_text.gsub(/[\r\n]/, '') end assert_syntactical_equivalence("div/div:first", ".//div/div[position() = 1]", "\r\nTender Lovemaking\r\nThe act of making love, tenderly.\r\n".gsub(/[\r\n]/, '')) do |j| j.first.inner_text.gsub(/[\r\n]/, '') end assert_syntactical_equivalence("div//a:last()", ".//div//a[position() = last()]", "Wordpress") do |j| j.last.inner_text end assert_syntactical_equivalence("div//a:last", ".//div//a[position() = last()]", "Wordpress") do |j| j.last.inner_text end end def test_multiple_filters assert_syntactical_equivalence("a[@rel='bookmark'][1]", ".//a[@rel = 'bookmark' and position() = 1]", "Back Home!") do |j| j.first.inner_text end end # TODO: # doc/'title ~ link' -> links that are siblings of title # doc/'p[@class~="final"]' -> class includes string (whitespacy) # doc/'p[text()*="final"]' -> class includes string (index) (broken: always returns true?) # doc/'p[text()$="final"]' -> /final$/ # doc/'p[text()|="final"]' -> /^final$/ # doc/'p[text()^="final"]' -> string starts with 'final # nth_first # nth_last # even # odd # first-child, nth-child, last-child, nth-last-child, nth-last-of-type # only-of-type, only-child # parent # empty # root end nokogiri-1.6.1/test/html/0000755000175000017500000000000012261213762014643 5ustar boutilboutilnokogiri-1.6.1/test/html/test_builder.rb0000644000175000017500000001103012261213762017650 0ustar boutilboutilrequire "helper" module Nokogiri module HTML class TestBuilder < Nokogiri::TestCase def test_top_level_function_builds foo = nil Nokogiri() { |xml| foo = xml } assert_instance_of Nokogiri::HTML::Builder, foo end def test_builder_with_explicit_tags html_doc = Nokogiri::HTML::Builder.new { div.slide(:class => 'another_class') { node = Nokogiri::XML::Node.new("id", doc) node.content = "hello" insert(node) } }.doc assert_equal 1, html_doc.css('div.slide > id').length assert_equal 'hello', html_doc.at('div.slide > id').content end def test_hash_as_attributes_for_attribute_method html = Nokogiri::HTML::Builder.new { || div.slide(:class => 'another_class') { span 'Slide 1' } }.to_html assert_match 'class="slide another_class"', html end def test_hash_as_attributes builder = Nokogiri::HTML::Builder.new do div(:id => 'awesome') { h1 "america" } end assert_equal('

      america

      ', builder.doc.root.to_html.gsub(/\n/, '').gsub(/>\s*<')) end def test_href_with_attributes uri = 'http://tenderlovemaking.com/' built = Nokogiri::XML::Builder.new { div { a('King Khan & The Shrines', :href => uri) } } assert_equal 'http://tenderlovemaking.com/', built.doc.at('a')[:href] end def test_tag_nesting builder = Nokogiri::HTML::Builder.new do body { span.left '' span.middle { div.icon '' } span.right '' } end assert node = builder.doc.css('span.right').first assert_equal 'middle', node.previous_sibling['class'] end def test_has_ampersand builder = Nokogiri::HTML::Builder.new do div.rad.thing! { text "" b "hello & world" } end assert_equal( '
      <awe&some>hello & world
      ', builder.doc.root.to_html.gsub(/\n/, '')) end def test_multi_tags builder = Nokogiri::HTML::Builder.new do div.rad.thing! { text "" b "hello" } end assert_equal( '
      <awesome>hello
      ', builder.doc.root.to_html.gsub(/\n/, '')) end def test_attributes_plus_block builder = Nokogiri::HTML::Builder.new do div.rad.thing! { text "" } end assert_equal('
      <awesome>
      ', builder.doc.root.to_html.chomp) end def test_builder_adds_attributes builder = Nokogiri::HTML::Builder.new do div.rad.thing! "tender div" end assert_equal('
      tender div
      ', builder.doc.root.to_html.chomp) end def test_bold_tag builder = Nokogiri::HTML::Builder.new do b "bold tag" end assert_equal('bold tag', builder.doc.root.to_html.chomp) end def test_html_then_body_tag builder = Nokogiri::HTML::Builder.new do html { body { b "bold tag" } } end assert_equal('bold tag', builder.doc.root.to_html.chomp.gsub(/>\s*<')) end def test_instance_eval_with_delegation_to_block_context class << self def foo "foo!" end end builder = Nokogiri::HTML::Builder.new { text foo } assert builder.to_html.include?("foo!") end def test_builder_with_param doc = Nokogiri::HTML::Builder.new { |html| html.body { html.p "hello world" } }.doc assert node = doc.xpath('//body/p').first assert_equal 'hello world', node.content end def test_builder_with_id text = "hello world" doc = Nokogiri::HTML::Builder.new { |html| html.body { html.id_ text } }.doc assert node = doc.xpath('//body/id').first assert_equal text, node.content end end end end nokogiri-1.6.1/test/html/test_document_fragment.rb0000644000175000017500000002230212261213762021727 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module HTML class TestDocumentFragment < Nokogiri::TestCase def setup super @html = Nokogiri::HTML.parse(File.read(HTML_FILE), HTML_FILE) end if RUBY_VERSION >= '1.9' def test_inspect_encoding fragment = "
      こんにちは!
      ".encode('EUC-JP') f = Nokogiri::HTML::DocumentFragment.parse fragment assert_equal "こんにちは!", f.content end def test_html_parse_encoding fragment = "
      こんにちは!
      ".encode 'EUC-JP' f = Nokogiri::HTML.fragment fragment assert_equal 'EUC-JP', f.document.encoding assert_equal "こんにちは!", f.content end end def test_colons_are_not_removed doc = Nokogiri::HTML::DocumentFragment.parse("3:30pm") assert_match(/3:30/, doc.to_s) end def test_parse_encoding fragment = "
      hello world
      " f = Nokogiri::HTML::DocumentFragment.parse fragment, 'ISO-8859-1' assert_equal 'ISO-8859-1', f.document.encoding assert_equal "hello world", f.content end def test_html_parse_with_encoding fragment = "
      hello world
      " f = Nokogiri::HTML.fragment fragment, 'ISO-8859-1' assert_equal 'ISO-8859-1', f.document.encoding assert_equal "hello world", f.content end def test_parse_in_context assert_equal('
      ', @html.root.parse('
      ').to_s) end def test_inner_html= fragment = Nokogiri::HTML.fragment '
      ' fragment.inner_html = "hello" assert_equal 'hello', fragment.inner_html end def test_ancestors_search html = %q{
      • foo
      } fragment = Nokogiri::HTML.fragment html li = fragment.at('li') assert li.matches?('li') end def test_fun_encoding string = %Q(こんにちは) html = Nokogiri::HTML::DocumentFragment.parse( string ).to_html(:encoding => 'UTF-8') assert_equal string, html end def test_new assert Nokogiri::HTML::DocumentFragment.new(@html) end def test_body_fragment_should_contain_body fragment = Nokogiri::HTML::DocumentFragment.parse("
      foo
      ") assert_match(/^/, fragment.to_s) end def test_nonbody_fragment_should_not_contain_body fragment = Nokogiri::HTML::DocumentFragment.parse("
      foo
      ") assert_match(/^
      /, fragment.to_s) end def test_fragment_should_have_document fragment = Nokogiri::HTML::DocumentFragment.new(@html) assert_equal @html, fragment.document end def test_empty_fragment_should_be_searchable_by_css fragment = Nokogiri::HTML.fragment("") assert_equal 0, fragment.css("a").size end def test_empty_fragment_should_be_searchable fragment = Nokogiri::HTML.fragment("") assert_equal 0, fragment.search("//a").size end def test_name fragment = Nokogiri::HTML::DocumentFragment.new(@html) assert_equal '#document-fragment', fragment.name end def test_static_method fragment = Nokogiri::HTML::DocumentFragment.parse("
      a
      ") assert_instance_of Nokogiri::HTML::DocumentFragment, fragment end def test_many_fragments 100.times { Nokogiri::HTML::DocumentFragment.new(@html) } end def test_subclass klass = Class.new(Nokogiri::HTML::DocumentFragment) fragment = klass.new(@html, "
      a
      ") assert_instance_of klass, fragment end def test_subclass_parse klass = Class.new(Nokogiri::HTML::DocumentFragment) doc = klass.parse("
      a
      ") assert_instance_of klass, doc end def test_html_fragment fragment = Nokogiri::HTML.fragment("
      a
      ") assert_equal "
      a
      ", fragment.to_s end def test_html_fragment_has_outer_text doc = "a
      b
      c" fragment = Nokogiri::HTML::Document.new.fragment(doc) if Nokogiri.uses_libxml? && Nokogiri::VERSION_INFO['libxml']['loaded'] <= "2.6.16" assert_equal "a
      b

      c

      ", fragment.to_s else assert_equal "a
      b
      c", fragment.to_s end end def test_html_fragment_case_insensitivity doc = "
      b
      " fragment = Nokogiri::HTML::Document.new.fragment(doc) assert_equal "
      b
      ", fragment.to_s end def test_html_fragment_with_leading_whitespace doc = "
      b
      " fragment = Nokogiri::HTML::Document.new.fragment(doc) assert_match %r%
      b
      *%, fragment.to_s end def test_html_fragment_with_leading_whitespace_and_newline doc = " \n
      b
      " fragment = Nokogiri::HTML::Document.new.fragment(doc) assert_match %r% \n
      b
      *%, fragment.to_s end def test_html_fragment_with_leading_text_and_newline fragment = HTML::Document.new.fragment("First line\nSecond line
      Broken line") assert_equal fragment.to_s, "First line\nSecond line
      Broken line" end def test_html_fragment_with_leading_whitespace_and_text_and_newline fragment = HTML::Document.new.fragment(" First line\nSecond line
      Broken line") assert_equal " First line\nSecond line
      Broken line", fragment.to_s end def test_html_fragment_with_leading_entity failed = ""test
      test"" fragment = Nokogiri::HTML::DocumentFragment.parse(failed) assert_equal '"test
      test"', fragment.to_html end def test_to_s doc = "foo
      bar" fragment = Nokogiri::HTML::Document.new.fragment(doc) assert_equal "foo
      bar", fragment.to_s end def test_to_html doc = "foo
      bar" fragment = Nokogiri::HTML::Document.new.fragment(doc) assert_equal "foo
      bar", fragment.to_html end def test_to_xhtml doc = "foo
      bar" fragment = Nokogiri::HTML::Document.new.fragment(doc) if Nokogiri.jruby? || Nokogiri::VERSION_INFO['libxml']['loaded'] >= "2.7.0" assert_equal "foo
      bar", fragment.to_xhtml else # FIXME: why are we doing this ? this violates the spec, # see http://www.w3.org/TR/xhtml1/#C_2 assert_equal "foo
      bar", fragment.to_xhtml end end def test_to_xml doc = "foo
      bar" fragment = Nokogiri::HTML::Document.new.fragment(doc) assert_equal "foo
      bar", fragment.to_xml end def test_fragment_script_tag_with_cdata doc = HTML::Document.new fragment = doc.fragment("") assert_equal("", fragment.to_s) end def test_fragment_with_comment doc = HTML::Document.new fragment = doc.fragment("

      hello

      ") assert_equal("

      hello

      ", fragment.to_s) end def test_malformed_fragment_is_corrected fragment = HTML::DocumentFragment.parse("
      ") assert_equal "
      ", fragment.to_s end def test_unclosed_script_tag # see GH#315 fragment = HTML::DocumentFragment.parse("foo ", fragment.to_html end def test_error_propagation_on_fragment_parse frag = Nokogiri::HTML::DocumentFragment.parse "oh, hello there." assert frag.errors.any?{|err| err.to_s =~ /Tag hello invalid/}, "errors should be copied to the fragment" end def test_error_propagation_on_fragment_parse_in_node_context doc = Nokogiri::HTML::Document.parse "
      " context_node = doc.at_css "div" frag = Nokogiri::HTML::DocumentFragment.new doc, "oh, hello there.", context_node assert frag.errors.any?{|err| err.to_s =~ /Tag hello invalid/}, "errors should be on the context node's document" end def test_error_propagation_on_fragment_parse_in_node_context_should_not_include_preexisting_errors doc = Nokogiri::HTML::Document.parse "
      " assert doc.errors.any?{|err| err.to_s =~ /jimmy/}, "assert on setup" context_node = doc.at_css "div" frag = Nokogiri::HTML::DocumentFragment.new doc, "oh, hello there.", context_node assert frag.errors.any?{|err| err.to_s =~ /Tag hello invalid/}, "errors should be on the context node's document" assert frag.errors.none?{|err| err.to_s =~ /jimmy/}, "errors should not include pre-existing document errors" end end end end nokogiri-1.6.1/test/html/test_node.rb0000644000175000017500000001342712261213762017163 0ustar boutilboutilrequire "helper" require 'nkf' module Nokogiri module HTML class TestNode < Nokogiri::TestCase def setup super @html = Nokogiri::HTML(<<-eohtml) eohtml end def test_to_a assert_equal [['class', 'bar'], ['href', 'foo']],@html.at('a').to_a.sort end def test_attr node = @html.at('div.baz') assert_equal node['class'], node.attr('class') end def test_get_attribute element = @html.at('div') assert_equal 'baz', element.get_attribute('class') assert_equal 'baz', element['class'] element['href'] = "javascript:alert(\"AGGA-KA-BOO!\")" assert_match(/%22AGGA-KA-BOO!%22/, element.to_html) end # The HTML parser ignores namespaces, so even properly declared namespaces # are treated as as undeclared and have to be accessed via prefix:tagname def test_ns_attribute html = '' doc = Nokogiri::HTML(html) assert_equal 'baz', (doc%'i')['foo:bar'] end def test_css_path_round_trip doc = Nokogiri::HTML(File.read(HTML_FILE)) %w{ #header small div[2] div.post body }.each do |css_sel| ele = doc.at css_sel assert_equal ele, doc.at(ele.css_path), ele.css_path end end def test_path_round_trip doc = Nokogiri::HTML(File.read(HTML_FILE)) %w{ #header small div[2] div.post body }.each do |css_sel| ele = doc.at css_sel assert_equal ele, doc.at(ele.path), ele.path end end def test_append_with_document assert_raises(ArgumentError) do @html.root << Nokogiri::HTML::Document.new end end ### # Make sure a document that doesn't declare a meta encoding returns # nil. def test_meta_encoding assert_nil @html.meta_encoding end def test_description assert desc = @html.at('a.bar').description assert_equal 'a', desc.name end def test_ancestors_with_selector assert node = @html.at('a.bar').child assert list = node.ancestors('.baz') assert_equal 1, list.length assert_equal 'div', list.first.name end def test_matches_inside_fragment fragment = DocumentFragment.new @html fragment << XML::Node.new('a', @html) a = fragment.children.last assert a.matches?('a'), 'a should match' end def test_css_matches? assert node = @html.at('a.bar') assert node.matches?('a.bar') end def test_xpath_matches? assert node = @html.at('//a') assert node.matches?('//a') end def test_unlink_then_swap node = @html.at('a') node.unlink another_node = @html.at('div') assert another_node, 'should have a node' # This used to segv assert node.add_previous_sibling another_node end def test_swap @html.at('div').swap('bar') a_tag = @html.css('a').first assert_equal 'body', a_tag.parent.name assert_equal 0, @html.css('div').length end def test_swap_with_regex_characters @html.at('div').swap('ba)r') a_tag = @html.css('a').first assert_equal 'ba)r', a_tag.text end def test_attribute_decodes_entities node = @html.at('div') node['href'] = 'foo&bar' assert_equal 'foo&bar', node['href'] node['href'] += '&baz' assert_equal 'foo&bar&baz', node['href'] end def test_parse_config_option node = @html.at('div') options = nil node.parse("
      ") do |config| options = config end assert_equal Nokogiri::XML::ParseOptions::DEFAULT_HTML, options.to_i end def test_fragment_handler_does_not_regurge_on_invalid_attributes iframe = %Q{} assert @html.at('div').fragment(iframe) end def test_fragment fragment = @html.fragment(<<-eohtml) hello

      bar

      world eohtml assert_match(/^hello/, fragment.inner_html.strip) assert_equal 3, fragment.children.length assert p_tag = fragment.css('p').first assert_equal 'div', p_tag.parent.name assert_equal 'foo', p_tag.parent['class'] end def test_fragment_serialization fragment = Nokogiri::HTML.fragment("
      foo
      ") assert_equal "
      foo
      ", fragment.serialize.chomp assert_equal "
      foo
      ", fragment.to_xml.chomp assert_equal "
      foo
      ", fragment.inner_html assert_equal "
      foo
      ", fragment.to_html assert_equal "
      foo
      ", fragment.to_s end def test_to_html_does_not_contain_entities return unless defined?(NKF) # NKF is not implemented on Rubinius as of 2009-11-23 html = NKF.nkf("-e --msdos", <<-EOH)

      test paragraph foo bar

      EOH nokogiri = Nokogiri::HTML.parse(html) if RUBY_PLATFORM =~ /java/ # NKF linebreak modes are not supported as of jruby 1.2 # see http://jira.codehaus.org/browse/JRUBY-3602 for status assert_equal "

      testparagraph\nfoobar

      ", nokogiri.at("p").to_html.gsub(/ /, '') else assert_equal "

      testparagraph\r\nfoobar

      ", nokogiri.at("p").to_html.gsub(/ /, '') end end end end end nokogiri-1.6.1/test/html/test_document_encoding.rb0000644000175000017500000001117412261213762021717 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module HTML if RUBY_VERSION =~ /^1\.9/ class TestDocumentEncoding < Nokogiri::TestCase def test_encoding doc = Nokogiri::HTML File.open(SHIFT_JIS_HTML, 'rb') hello = "こんにちは" assert_match doc.encoding, doc.to_html assert_match hello.encode('Shift_JIS'), doc.to_html assert_equal 'Shift_JIS', doc.to_html.encoding.name assert_match hello, doc.to_html(:encoding => 'UTF-8') assert_match 'UTF-8', doc.to_html(:encoding => 'UTF-8') assert_match 'UTF-8', doc.to_html(:encoding => 'UTF-8').encoding.name end def test_default_to_encoding_from_string bad_charset = <<-eohtml blah! eohtml doc = Nokogiri::HTML(bad_charset) assert_equal bad_charset.encoding.name, doc.encoding doc = Nokogiri.parse(bad_charset) assert_equal bad_charset.encoding.name, doc.encoding end def test_encoding_non_utf8 orig = '日本語が上手です' bin = Encoding::ASCII_8BIT [Encoding::Shift_JIS, Encoding::EUC_JP].each do |enc| html = <<-eohtml.encode(enc) #{orig} eohtml text = Nokogiri::HTML.parse(html).at('title').inner_text assert_equal( orig.encode(enc).force_encoding(bin), text.encode(enc).force_encoding(bin) ) end end def test_encoding_with_a_bad_name bad_charset = <<-eohtml blah! eohtml doc = Nokogiri::HTML(bad_charset, nil, 'askldjfhalsdfjhlkasdfjh') assert_equal ['http://tenderlovemaking.com/'], doc.css('a').map { |a| a['href'] } end end end class TestDocumentEncodingDetection < Nokogiri::TestCase if IO.respond_to?(:binread) def binread(file) IO.binread(file) end else def binread(file) IO.read(file) end end def binopen(file) File.open(file, 'rb') end def test_document_html_noencoding from_stream = Nokogiri::HTML(binopen(NOENCODING_FILE)) from_string = Nokogiri::HTML(binread(NOENCODING_FILE)) assert_equal from_string.to_s.size, from_stream.to_s.size end def test_document_html_charset html = Nokogiri::HTML(binopen(METACHARSET_FILE)) assert_equal 'iso-2022-jp', html.encoding assert_equal 'たこ焼き仮面', html.title end def test_document_xhtml_enc [ENCODING_XHTML_FILE, ENCODING_HTML_FILE].each { |file| doc_from_string_enc = Nokogiri::HTML(binread(file), nil, 'Shift_JIS') ary_from_string_enc = doc_from_string_enc.xpath('//p/text()').map { |text| text.text } doc_from_string = Nokogiri::HTML(binread(file)) ary_from_string = doc_from_string.xpath('//p/text()').map { |text| text.text } doc_from_file_enc = Nokogiri::HTML(binopen(file), nil, 'Shift_JIS') ary_from_file_enc = doc_from_file_enc.xpath('//p/text()').map { |text| text.text } doc_from_file = Nokogiri::HTML(binopen(file)) ary_from_file = doc_from_file.xpath('//p/text()').map { |text| text.text } title = 'たこ焼き仮面' assert_equal(title, doc_from_string_enc.at('//title/text()').text) assert_equal(title, doc_from_string.at('//title/text()').text) assert_equal(title, doc_from_file_enc.at('//title/text()').text) unless Nokogiri.jruby? && file == ENCODING_HTML_FILE assert_equal(title, doc_from_file.at('//title/text()').text) end evil = (0..72).map { |i| '超' * i + '悪い事を構想中。' } assert_equal(evil, ary_from_string_enc) assert_equal(evil, ary_from_string) assert_equal(evil, ary_from_file_enc) assert_equal(evil, ary_from_file) } end end end end nokogiri-1.6.1/test/html/sax/0000755000175000017500000000000012261213762015436 5ustar boutilboutilnokogiri-1.6.1/test/html/sax/test_parser_context.rb0000644000175000017500000000202112261213762022055 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module HTML module SAX class TestParserContext < Nokogiri::SAX::TestCase def test_from_io ctx = ParserContext.new StringIO.new('fo'), 'UTF-8' assert ctx end def test_from_string ctx = ParserContext.new 'blah blah' assert ctx end def test_parse_with ctx = ParserContext.new 'blah' assert_raises ArgumentError do ctx.parse_with nil end end def test_parse_with_sax_parser # assert_nothing_raised do xml = "" ctx = ParserContext.new xml parser = Parser.new Doc.new ctx.parse_with parser # end end def test_from_file # assert_nothing_raised do ctx = ParserContext.file HTML_FILE, 'UTF-8' parser = Parser.new Doc.new ctx.parse_with parser # end end end end end end nokogiri-1.6.1/test/html/sax/test_parser.rb0000644000175000017500000001002112261213762020310 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module HTML module SAX class TestParser < Nokogiri::SAX::TestCase def setup super @parser = HTML::SAX::Parser.new(Doc.new) end def test_parse_empty_document # This caused a segfault in libxml 2.6.x assert_nil @parser.parse '' end def test_parse_empty_file # Make sure empty files don't break stuff empty_file_name = File.join(ASSETS_DIR, 'bogus.xml') # assert_nothing_raised do @parser.parse_file empty_file_name # end end def test_parse_file @parser.parse_file(HTML_FILE) # Take a look at the comment in test_parse_document to know # a possible reason to this difference. if Nokogiri.uses_libxml? assert_equal 1110, @parser.document.end_elements.length else assert_equal 1119, @parser.document.end_elements.length end end def test_parse_file_nil_argument assert_raises(ArgumentError) { @parser.parse_file(nil) } end def test_parse_file_non_existant assert_raise Errno::ENOENT do @parser.parse_file('there_is_no_reasonable_way_this_file_exists') end end def test_parse_file_with_dir assert_raise Errno::EISDIR do @parser.parse_file(File.dirname(__FILE__)) end end def test_parse_memory_nil assert_raise ArgumentError do @parser.parse_memory(nil) end end def test_parse_force_encoding @parser.parse_memory(<<-HTML, 'UTF-8') Информация HTML assert_equal("Информация", @parser.document.data.join.strip) end def test_parse_document @parser.parse_memory(<<-eoxml)

      Paragraph 1

      Paragraph 2

      eoxml # JRuby version is different because of the internal implementation # JRuby version uses NekoHTML which inserts empty "head" elements. # # Currently following features are set: # "http://cyberneko.org/html/properties/names/elems" => "lower" # "http://cyberneko.org/html/properties/names/attrs" => "lower" if Nokogiri.uses_libxml? assert_equal([["html", []], ["body", []], ["p", []], ["p", []]], @parser.document.start_elements) else assert_equal([["html", []], ["head", []], ["body", []], ["p", []], ["p", []]], @parser.document.start_elements) end end def test_parser_attributes html = <<-eohtml hello
      eohtml block_called = false @parser.parse(html) { |ctx| block_called = true ctx.replace_entities = true } assert block_called noshade_value = if Nokogiri.uses_libxml? && Nokogiri::VERSION_INFO['libxml']['loaded'] < '2.7.7' ['noshade', 'noshade'] else ['noshade', nil] end assert_equal [ ['html', []], ['head', []], ['title', []], ['body', []], ['img', [ ['src', 'face.jpg'], ['title', 'daddy & me'] ]], ['hr', [ noshade_value, ['size', '2'] ]] ], @parser.document.start_elements end def test_empty_processing_instruction @parser.parse_memory("this will segfault") end end end end end nokogiri-1.6.1/test/html/test_document.rb0000644000175000017500000003453312261213762020055 0ustar boutilboutilrequire "helper" module Nokogiri module HTML class TestDocument < Nokogiri::TestCase def setup super @html = Nokogiri::HTML.parse(File.read(HTML_FILE)) end def test_nil_css # Behavior is undefined but shouldn't break assert @html.css(nil) assert @html.xpath(nil) end def test_exceptions_remove_newlines errors = @html.errors assert errors.length > 0, 'has errors' errors.each do |error| assert_equal(error.to_s.chomp, error.to_s) end end def test_fragment fragment = @html.fragment assert_equal 0, fragment.children.length end def test_document_takes_config_block options = nil Nokogiri::HTML(File.read(HTML_FILE), HTML_FILE) do |cfg| options = cfg options.nonet.nowarning.dtdattr end assert options.nonet? assert options.nowarning? assert options.dtdattr? end def test_parse_takes_config_block options = nil Nokogiri::HTML.parse(File.read(HTML_FILE), HTML_FILE) do |cfg| options = cfg options.nonet.nowarning.dtdattr end assert options.nonet? assert options.nowarning? assert options.dtdattr? end def test_subclass klass = Class.new(Nokogiri::HTML::Document) doc = klass.new assert_instance_of klass, doc end def test_subclass_initialize klass = Class.new(Nokogiri::HTML::Document) do attr_accessor :initialized_with def initialize(*args) @initialized_with = args end end doc = klass.new("uri", "external_id", 1) assert_equal ["uri", "external_id", 1], doc.initialized_with end def test_subclass_dup klass = Class.new(Nokogiri::HTML::Document) doc = klass.new.dup assert_instance_of klass, doc end def test_subclass_parse klass = Class.new(Nokogiri::HTML::Document) doc = klass.parse(File.read(HTML_FILE)) assert_equal @html.to_s, doc.to_s assert_instance_of klass, doc end def test_document_parse_method html = Nokogiri::HTML::Document.parse(File.read(HTML_FILE)) assert_equal @html.to_s, html.to_s end def test_document_parse_method_with_url require 'open-uri' begin html = open('http://google.com').read rescue skip("This test needs the internet. Skips if no internet available.") end doc = Nokogiri::HTML html ,"http:/foobar.foobar/" refute_empty doc.to_s, "Document should not be empty" end ### # Nokogiri::HTML returns an empty Document when given a blank string GH#11 def test_empty_string_returns_empty_doc doc = Nokogiri::HTML('') assert_instance_of Nokogiri::HTML::Document, doc assert_nil doc.root end unless Nokogiri.uses_libxml? && %w[2 6] === LIBXML_VERSION.split('.')[0..1] # FIXME: this is a hack around broken libxml versions def test_to_xhtml_with_indent doc = Nokogiri::HTML('foo') doc = Nokogiri::HTML(doc.to_xhtml(:indent => 2)) assert_indent 2, doc end def test_write_to_xhtml_with_indent io = StringIO.new doc = Nokogiri::HTML('foo') doc.write_xhtml_to io, :indent => 5 io.rewind doc = Nokogiri::HTML(io.read) assert_indent 5, doc end end def test_swap_should_not_exist assert_raises(NoMethodError) { @html.swap } end def test_namespace_should_not_exist assert_raises(NoMethodError) { @html.namespace } end def test_meta_encoding assert_equal 'UTF-8', @html.meta_encoding end def test_meta_encoding_is_strict_about_http_equiv doc = Nokogiri::HTML(<<-eohtml) foo eohtml assert_nil doc.meta_encoding end def test_meta_encoding_handles_malformed_content_charset doc = Nokogiri::HTML(< foo EOHTML assert_nil doc.meta_encoding end def test_meta_encoding= @html.meta_encoding = 'EUC-JP' assert_equal 'EUC-JP', @html.meta_encoding end def test_title assert_equal 'Tender Lovemaking ', @html.title doc = Nokogiri::HTML('foo') assert_nil doc.title end def test_title=() doc = Nokogiri::HTML(< old foo eohtml doc.title = 'new' assert_equal 'new', doc.title doc = Nokogiri::HTML(< foo eohtml doc.title = 'new' assert_equal 'new', doc.title doc = Nokogiri::HTML(< foo eohtml doc.title = 'new' if Nokogiri.uses_libxml? assert_nil doc.title else assert_equal 'new', doc.title end end def test_meta_encoding_without_head html = Nokogiri::HTML('foo') assert_nil html.meta_encoding html.meta_encoding = 'EUC-JP' assert_nil html.meta_encoding end def test_meta_encoding_with_empty_content_type html = Nokogiri::HTML(<<-eohtml) foo eohtml assert_nil html.meta_encoding html = Nokogiri::HTML(<<-eohtml) foo eohtml assert_nil html.meta_encoding end def test_root_node_parent_is_document parent = @html.root.parent assert_equal @html, parent assert_instance_of Nokogiri::HTML::Document, parent end def test_parse_handles_nil_gracefully @doc = Nokogiri::HTML::Document.parse(nil) assert_instance_of Nokogiri::HTML::Document, @doc end def test_parse_empty_document doc = Nokogiri::HTML("\n") assert_equal 0, doc.css('a').length assert_equal 0, doc.xpath('//a').length assert_equal 0, doc.search('//a').length end def test_HTML_function html = Nokogiri::HTML(File.read(HTML_FILE)) assert html.html? end def test_parse_io assert File.open(HTML_FILE, 'rb') { |f| Document.read_io(f, nil, 'UTF-8', XML::ParseOptions::NOERROR | XML::ParseOptions::NOWARNING ) } end def test_parse_temp_file temp_html_file = Tempfile.new("TEMP_HTML_FILE") File.open(HTML_FILE, 'rb') { |f| temp_html_file.write f.read } temp_html_file.close temp_html_file.open assert_equal Nokogiri::HTML.parse(File.read(HTML_FILE)).xpath('//div/a').length, Nokogiri::HTML.parse(temp_html_file).xpath('//div/a').length end def test_to_xhtml assert_match 'XHTML', @html.to_xhtml assert_match 'XHTML', @html.to_xhtml(:encoding => 'UTF-8') assert_match 'UTF-8', @html.to_xhtml(:encoding => 'UTF-8') end def test_no_xml_header html = Nokogiri::HTML(<<-eohtml) eohtml assert html.to_html.length > 0, 'html length is too short' assert_no_match(/^<\?xml/, html.to_html) end def test_document_has_error html = Nokogiri::HTML(<<-eohtml)

      Rainbow Dash

      eohtml assert_equal "html", html.internal_subset.name assert_equal "-//W3C//DTD XHTML 1.1//EN", html.internal_subset.external_id assert_equal "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd", html.internal_subset.system_id assert_equal "", html.to_s[0,97] end def test_content_size html = Nokogiri::HTML('
      ') assert_equal 1, html.content.size assert_equal 1, html.content.split("").size assert_equal "\n", html.content end def test_find_by_xpath found = @html.xpath('//div/a') assert_equal 3, found.length end def test_find_by_css found = @html.css('div > a') assert_equal 3, found.length end def test_find_by_css_with_square_brackets found = @html.css("div[@id='header'] > h1") found = @html.css("div[@id='header'] h1") # this blows up on commit 6fa0f6d329d9dbf1cc21c0ac72f7e627bb4c05fc assert_equal 1, found.length end def test_find_with_function assert @html.css("div:awesome() h1", Class.new { def awesome divs [divs.first] end }.new) end def test_dup_shallow found = @html.search('//div/a').first dup = found.dup(0) assert dup assert_equal '', dup.content end def test_search_can_handle_xpath_and_css found = @html.search('//div/a', 'div > p') length = @html.xpath('//div/a').length + @html.css('div > p').length assert_equal length, found.length end def test_dup_document assert dup = @html.dup assert_not_equal dup, @html assert @html.html? assert_instance_of Nokogiri::HTML::Document, dup assert dup.html?, 'duplicate should be html' assert_equal @html.to_s, dup.to_s end def test_dup_document_shallow assert dup = @html.dup(0) assert_not_equal dup, @html end def test_dup found = @html.search('//div/a').first dup = found.dup assert dup assert_equal found.content, dup.content assert_equal found.document, dup.document end def test_inner_html html = Nokogiri::HTML(<<-eohtml)

      Hello world!

      eohtml node = html.xpath('//div').first assert_equal('

      Helloworld!

      ', node.inner_html.gsub(/\s/, '')) end def test_round_trip doc = Nokogiri::HTML(@html.inner_html) assert_equal @html.root.to_html, doc.root.to_html end def test_fragment_contains_text_node fragment = Nokogiri::HTML.fragment('fooo') assert_equal 1, fragment.children.length assert_equal 'fooo', fragment.inner_text end def test_fragment_includes_two_tags assert_equal 2, Nokogiri::HTML.fragment("

      ").children.length end def test_relative_css_finder doc = Nokogiri::HTML(<<-eohtml)

      inside red

      inside green

      eohtml red_divs = doc.css('div.red') assert_equal 1, red_divs.length p_tags = red_divs.first.css('p') assert_equal 1, p_tags.length assert_equal 'inside red', p_tags.first.text.strip end def test_find_classes doc = Nokogiri::HTML(<<-eohtml)

      RED

      RED

      GREEN

      GREEN

      eohtml list = doc.css('.red') assert_equal 2, list.length assert_equal %w{ RED RED }, list.map { |x| x.text } end def test_parse_can_take_io html = nil File.open(HTML_FILE, 'rb') { |f| html = Nokogiri::HTML(f) } assert html.html? end def test_html? assert !@html.xml? assert @html.html? end def test_serialize assert @html.serialize assert @html.to_html end def test_empty_document # empty document should return "" #699 assert_equal "", Nokogiri::HTML.parse(nil).text assert_equal "", Nokogiri::HTML.parse("").text end end end end nokogiri-1.6.1/test/html/test_element_description.rb0000644000175000017500000000537312261213762022273 0ustar boutilboutilrequire "helper" module Nokogiri module HTML class TestElementDescription < Nokogiri::TestCase def test_fetch_nonexistent assert_nil ElementDescription['foo'] end def test_fetch_element_description assert desc = ElementDescription['a'] assert_instance_of ElementDescription, desc end def test_name assert_equal 'a', ElementDescription['a'].name end def test_implied_start_tag? assert !ElementDescription['a'].implied_start_tag? end def test_implied_end_tag? assert !ElementDescription['a'].implied_end_tag? assert ElementDescription['p'].implied_end_tag? end def test_save_end_tag? assert !ElementDescription['a'].save_end_tag? assert ElementDescription['br'].save_end_tag? end def test_empty? assert ElementDescription['br'].empty? assert !ElementDescription['a'].empty? end def test_deprecated? assert ElementDescription['applet'].deprecated? assert !ElementDescription['br'].deprecated? end def test_inline? assert ElementDescription['a'].inline? assert !ElementDescription['div'].inline? end def test_block? element = ElementDescription['a'] assert_equal(!element.inline?, element.block?) end def test_description assert ElementDescription['a'].description end def test_subelements sub_elements = ElementDescription['body'].sub_elements if Nokogiri.uses_libxml? && Nokogiri::LIBXML_VERSION >= '2.7.7' assert_equal 65, sub_elements.length elsif Nokogiri.uses_libxml? assert_equal 61, sub_elements.length else assert sub_elements.length > 0 end end def test_default_sub_element assert_equal 'div', ElementDescription['body'].default_sub_element end def test_null_default_sub_element doc = Nokogiri::HTML('foo') doc.root.description.default_sub_element end def test_optional_attributes attrs = ElementDescription['table'].optional_attributes assert attrs end def test_deprecated_attributes attrs = ElementDescription['table'].deprecated_attributes assert attrs assert_equal 2, attrs.length end def test_required_attributes attrs = ElementDescription['table'].required_attributes assert attrs assert_equal 0, attrs.length end def test_inspect desc = ElementDescription['input'] assert_match desc.name, desc.inspect end def test_to_s desc = ElementDescription['input'] assert_match desc.name, desc.to_s end end end end nokogiri-1.6.1/test/html/test_node_encoding.rb0000644000175000017500000000130712261213762021023 0ustar boutilboutil# -*- coding: utf-8 -*- require "helper" module Nokogiri module HTML if RUBY_VERSION =~ /^1\.9/ class TestNodeEncoding < Nokogiri::TestCase def test_inner_html doc = Nokogiri::HTML File.open(SHIFT_JIS_HTML, 'rb') hello = "こんにちは" contents = doc.at('h2').inner_html assert_equal doc.encoding, contents.encoding.name assert_match hello.encode('Shift_JIS'), contents contents = doc.at('h2').inner_html(:encoding => 'UTF-8') assert_match hello, contents doc.encoding = 'UTF-8' contents = doc.at('h2').inner_html assert_match hello, contents end end end end end nokogiri-1.6.1/test/html/test_named_characters.rb0000644000175000017500000000052412261213762021513 0ustar boutilboutilrequire "helper" module Nokogiri module HTML class TestNamedCharacters < Nokogiri::TestCase def test_named_character copy = NamedCharacters.get('copy') assert_equal 169, NamedCharacters['copy'] assert_equal copy.value, NamedCharacters['copy'] assert copy.description end end end end nokogiri-1.6.1/test/test_css_cache.rb0000644000175000017500000000207212261213762017177 0ustar boutilboutilrequire "helper" class TestCssCache < Nokogiri::TestCase def setup super @css = "a1 > b2 > c3" @parse_result = Nokogiri::CSS.parse(@css) @to_xpath_result = @parse_result.map {|ast| ast.to_xpath} Nokogiri::CSS::Parser.class_eval do class << @cache alias :old_bracket :[] attr_reader :count def [](key) @count ||= 0 @count += 1 old_bracket(key) end end end assert Nokogiri::CSS::Parser.cache_on? end def teardown Nokogiri::CSS::Parser.clear_cache Nokogiri::CSS::Parser.set_cache true end [ false, true ].each do |cache_setting| define_method "test_css_cache_#{cache_setting ? "true" : "false"}" do times = cache_setting ? 4 : nil Nokogiri::CSS::Parser.set_cache cache_setting Nokogiri::CSS.xpath_for(@css) Nokogiri::CSS.xpath_for(@css) Nokogiri::CSS::Parser.new.xpath_for(@css) Nokogiri::CSS::Parser.new.xpath_for(@css) assert_equal(times, Nokogiri::CSS::Parser.class_eval { @cache.count }) end end end nokogiri-1.6.1/.gemtest0000644000175000017500000000000012261213762014357 0ustar boutilboutilnokogiri-1.6.1/build_all0000755000175000017500000000560012261213762014576 0ustar boutilboutil#! /usr/bin/env bash # # script to build gems for all relevant platforms: # - MRI et al (standard gem) # - windows (x86-mingw32 and x86-mswin32-60) # - jruby # # here's what I recommend for building all the gems: # # 1. set up a vagrant VM guest running ubuntu lucid 32-bit. # 2. install rvm, and install 1.9.3, 2.0.0 and jruby. # 3. `sudo apt-get install mingw32` # # as you build, you may run into these problems: # # - if you're using Virtualbox shared directories, you'll get a mingw # "Protocol error" at linktime. Boo! Either use NFS or a # locally-checked-out repository. # # - on ubuntus 11 and later, you may have issues with building # rake-compiler's rubies against openssl v2. Just comment the lines # out from ossl_ssl.c and you'll be fine. # # - you may have issues with Pathname conversion to String in # bundler. Add this to the offending bundler file: # # class Pathname # def to_str # to_s # end # end # # - you may also have to hack rubygems.rb to eliminate a reference to # RUBY_ENGINE (just comment it out) # HOST= # Load RVM into a shell session *as a function* if [[ -s "$HOME/.rvm/scripts/rvm" ]] ; then source "$HOME/.rvm/scripts/rvm" elif [[ -s "/usr/local/rvm/scripts/rvm" ]] ; then source "/usr/local/rvm/scripts/rvm" else echo "ERROR: An RVM installation was not found.\n" fi function rvm_use { current_ruby=$1 rvm use "${1}@nokogiri" --create || rvm -v } set -o errexit # initialize rvm_use 1.8.7 bundle install --quiet --local || bundle install rm -rf tmp pkg bundle exec rake clean # holding pen rm -rf gems mkdir -p gems # windows platform=$(uname -i) if [[ $platform =~ "64" ]] ; then echo "" echo "ERROR: You need to build the windows gem on a 32-bit machine!" echo "" exit 1 fi rvm_use 1.8.7 if [[ ! -a ${HOME}/.rake-compiler/ruby/ruby-1.8.7-p358/lib/ruby/1.8.7/x86_64-linux/rbconfig.rb ]] ; then # if this fails around the purelib.rb thing, try varying the ruby # used to run this script, and whether the HOST env var is set # below. bundle exec rake-compiler cross-ruby VERSION=1.8.7-p358 # HOST=i386-mingw32 fi if [[ ! -a ${HOME}/.rake-compiler/ruby/ruby-1.9.3-p194/lib/ruby/1.9.1/x86_64-linux/rbconfig.rb ]] ; then bundle exec rake-compiler cross-ruby VERSION=1.9.3-p194 fi if [[ ! -a ${HOME}/.rake-compiler/ruby/ruby-2.0.0-p0/lib/ruby/2.0.0/x86_64-linux/rbconfig.rb ]] ; then bundle exec rake-compiler cross-ruby VERSION=2.0.0-p0 fi bundle exec rake cross bundle exec rake gem:windows cp -v pkg/nokogiri*x86-{mingw32,mswin32}*.gem gems # MRI rvm_use 1.8.7 bundle exec rake gem cp -v pkg/nokogiri*.gem gems # should only be one at this point in the script # jruby rvm_use jruby bundle install --quiet --local || bundle install bundle exec rake clean clobber rvm_use 1.8.7 bundle exec rake generate rvm_use jruby bundle exec rake gem cp -v pkg/nokogiri*java.gem gems nokogiri-1.6.1/lib/0000755000175000017500000000000012261213762013466 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri.rb0000644000175000017500000000747412261213762015650 0ustar boutilboutil# -*- coding: utf-8 -*- # Modify the PATH on windows so that the external DLLs will get loaded. require 'rbconfig' ENV['PATH'] = [File.expand_path( File.join(File.dirname(__FILE__), "..", "ext", "nokogiri") ), ENV['PATH']].compact.join(';') if RbConfig::CONFIG['host_os'] =~ /(mswin|mingw)/i if defined?(RUBY_ENGINE) && RUBY_ENGINE == "jruby" # The line below caused a problem on non-GAE rack environment. # unless defined?(JRuby::Rack::VERSION) || defined?(AppEngine::ApiProxy) # # However, simply cutting defined?(JRuby::Rack::VERSION) off resulted in # an unable-to-load-nokogiri problem. Thus, now, Nokogiri checks the presense # of appengine-rack.jar in $LOAD_PATH. If Nokogiri is on GAE, Nokogiri # should skip loading xml jars. This is because those are in WEB-INF/lib and # already set in the classpath. unless $LOAD_PATH.to_s.include?("appengine-rack") require 'stringio' require 'isorelax.jar' require 'jing.jar' require 'nekohtml.jar' require 'nekodtd.jar' require 'xercesImpl.jar' end end require 'nokogiri/nokogiri' require 'nokogiri/version' require 'nokogiri/syntax_error' require 'nokogiri/xml' require 'nokogiri/xslt' require 'nokogiri/html' require 'nokogiri/decorators/slop' require 'nokogiri/css' require 'nokogiri/html/builder' # Nokogiri parses and searches XML/HTML very quickly, and also has # correctly implemented CSS3 selector support as well as XPath support. # # Parsing a document returns either a Nokogiri::XML::Document, or a # Nokogiri::HTML::Document depending on the kind of document you parse. # # Here is an example: # # require 'nokogiri' # require 'open-uri' # # # Get a Nokogiri::HTML:Document for the page we’re interested in... # # doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove')) # # # Do funky things with it using Nokogiri::XML::Node methods... # # #### # # Search for nodes by css # doc.css('h3.r a.l').each do |link| # puts link.content # end # # See Nokogiri::XML::Node#css for more information about CSS searching. # See Nokogiri::XML::Node#xpath for more information about XPath searching. module Nokogiri class << self ### # Parse an HTML or XML document. +string+ contains the document. def parse string, url = nil, encoding = nil, options = nil doc = if string.respond_to?(:read) || string =~ /^\s*<[^Hh>]*html/i # Probably html Nokogiri.HTML( string, url, encoding, options || XML::ParseOptions::DEFAULT_HTML ) else Nokogiri.XML(string, url, encoding, options || XML::ParseOptions::DEFAULT_XML) end yield doc if block_given? doc end ### # Create a new Nokogiri::XML::DocumentFragment def make input = nil, opts = {}, &blk if input Nokogiri::HTML.fragment(input).children.first else Nokogiri(&blk) end end ### # Parse a document and add the Slop decorator. The Slop decorator # implements method_missing such that methods may be used instead of CSS # or XPath. For example: # # doc = Nokogiri::Slop(<<-eohtml) # # #

      first

      #

      second

      # # # eohtml # assert_equal('second', doc.html.body.p[1].text) # def Slop(*args, &block) Nokogiri(*args, &block).slop! end end end ### # Parser a document contained in +args+. Nokogiri will try to guess what # type of document you are attempting to parse. For more information, see # Nokogiri.parse # # To specify the type of document, use Nokogiri.XML or Nokogiri.HTML. def Nokogiri(*args, &block) if block_given? builder = Nokogiri::HTML::Builder.new(&block) return builder.doc.root else Nokogiri.parse(*args) end end nokogiri-1.6.1/lib/xsd/0000755000175000017500000000000012261213762014264 5ustar boutilboutilnokogiri-1.6.1/lib/xsd/xmlparser/0000755000175000017500000000000012261213762016301 5ustar boutilboutilnokogiri-1.6.1/lib/xsd/xmlparser/nokogiri.rb0000644000175000017500000000573212261213762020456 0ustar boutilboutilrequire 'nokogiri' module XSD # :nodoc: module XMLParser # :nodoc: ### # Nokogiri XML parser for soap4r. # # Nokogiri may be used as the XML parser in soap4r. Simply require # 'xsd/xmlparser/nokogiri' in your soap4r applications, and soap4r # will use Nokogiri as it's XML parser. No other changes should be # required to use Nokogiri as the XML parser. # # Example (using UW ITS Web Services): # # require 'rubygems' # require 'nokogiri' # gem 'soap4r' # require 'defaultDriver' # require 'xsd/xmlparser/nokogiri' # # obj = AvlPortType.new # obj.getLatestByRoute(obj.getAgencies.first, 8).each do |bus| # p "#{bus.routeID}, #{bus.longitude}, #{bus.latitude}" # end # class Nokogiri < XSD::XMLParser::Parser ### # Create a new XSD parser with +host+ and +opt+ def initialize host, opt = {} super @parser = ::Nokogiri::XML::SAX::Parser.new(self, @charset || 'UTF-8') end ### # Start parsing +string_or_readable+ def do_parse string_or_readable @parser.parse(string_or_readable) end ### # Handle the start_element event with +name+ and +attrs+ def start_element name, attrs = [] super(name, Hash[*attrs.flatten]) end ### # Handle the end_element event with +name+ def end_element name super end ### # Handle errors with message +msg+ def error msg raise ParseError.new(msg) end alias :warning :error ### # Handle cdata_blocks containing +string+ def cdata_block string characters string end ### # Called at the beginning of an element # +name+ is the element name # +attrs+ is a list of attributes # +prefix+ is the namespace prefix for the element # +uri+ is the associated namespace URI # +ns+ is a hash of namespace prefix:urls associated with the element def start_element_namespace name, attrs = [], prefix = nil, uri = nil, ns = [] ### # Deal with SAX v1 interface name = [prefix, name].compact.join(':') attributes = ns.map { |ns_prefix,ns_uri| [['xmlns', ns_prefix].compact.join(':'), ns_uri] } + attrs.map { |attr| [[attr.prefix, attr.localname].compact.join(':'), attr.value] }.flatten start_element name, attributes end ### # Called at the end of an element # +name+ is the element's name # +prefix+ is the namespace prefix associated with the element # +uri+ is the associated namespace URI def end_element_namespace name, prefix = nil, uri = nil ### # Deal with SAX v1 interface end_element [prefix, name].compact.join(':') end %w{ xmldecl start_document end_document comment }.each do |name| class_eval %{ def #{name}(*args); end } end add_factory(self) end end end nokogiri-1.6.1/lib/nokogiri/0000755000175000017500000000000012261213762015307 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/html.rb0000644000175000017500000000225612261213762016605 0ustar boutilboutilrequire 'nokogiri/html/entity_lookup' require 'nokogiri/html/document' require 'nokogiri/html/document_fragment' require 'nokogiri/html/sax/parser_context' require 'nokogiri/html/sax/parser' require 'nokogiri/html/sax/push_parser' require 'nokogiri/html/element_description' require 'nokogiri/html/element_description_defaults' module Nokogiri class << self ### # Parse HTML. Convenience method for Nokogiri::HTML::Document.parse def HTML thing, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML, &block Nokogiri::HTML::Document.parse(thing, url, encoding, options, &block) end end module HTML class << self ### # Parse HTML. Convenience method for Nokogiri::HTML::Document.parse def parse thing, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML, &block Document.parse(thing, url, encoding, options, &block) end #### # Parse a fragment from +string+ in to a NodeSet. def fragment string, encoding = nil HTML::DocumentFragment.parse string, encoding end end # Instance of Nokogiri::HTML::EntityLookup NamedCharacters = EntityLookup.new end end nokogiri-1.6.1/lib/nokogiri/xslt/0000755000175000017500000000000012261213762016301 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/xslt/stylesheet.rb0000644000175000017500000000143712261213762021024 0ustar boutilboutilmodule Nokogiri module XSLT ### # A Stylesheet represents an XSLT Stylesheet object. Stylesheet creation # is done through Nokogiri.XSLT. Here is an example of transforming # an XML::Document with a Stylesheet: # # doc = Nokogiri::XML(File.read('some_file.xml')) # xslt = Nokogiri::XSLT(File.read('some_transformer.xslt')) # # puts xslt.transform(doc) # # See Nokogiri::XSLT::Stylesheet#transform for more transformation # information. class Stylesheet ### # Apply an XSLT stylesheet to an XML::Document. # +params+ is an array of strings used as XSLT parameters. # returns serialized document def apply_to document, params = [] serialize(transform(document, params)) end end end end nokogiri-1.6.1/lib/nokogiri/xslt.rb0000644000175000017500000000252412261213762016631 0ustar boutilboutilrequire 'nokogiri/xslt/stylesheet' module Nokogiri class << self ### # Create a Nokogiri::XSLT::Stylesheet with +stylesheet+. # # Example: # # xslt = Nokogiri::XSLT(File.read(ARGV[0])) # def XSLT stylesheet, modules = {} XSLT.parse(stylesheet, modules) end end ### # See Nokogiri::XSLT::Stylesheet for creating and manipulating # Stylesheet object. module XSLT class << self ### # Parse the stylesheet in +string+, register any +modules+ def parse string, modules = {} modules.each do |url, klass| XSLT.register url, klass end if Nokogiri.jruby? Stylesheet.parse_stylesheet_doc(XML.parse(string), string) else Stylesheet.parse_stylesheet_doc(XML.parse(string)) end end ### # Quote parameters in +params+ for stylesheet safety def quote_params params parray = (params.instance_of?(Hash) ? params.to_a.flatten : params).dup parray.each_with_index do |v,i| if i % 2 > 0 parray[i]= if v =~ /'/ "concat('#{ v.gsub(/'/, %q{', "'", '}) }')" else "'#{v}'"; end else parray[i] = v.to_s end end parray.flatten end end end end nokogiri-1.6.1/lib/nokogiri/syntax_error.rb0000644000175000017500000000010012261213762020362 0ustar boutilboutilmodule Nokogiri class SyntaxError < ::StandardError end end nokogiri-1.6.1/lib/nokogiri/xml/0000755000175000017500000000000012261213762016107 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/xml/builder.rb0000644000175000017500000002777012261213762020077 0ustar boutilboutilmodule Nokogiri module XML ### # Nokogiri builder can be used for building XML and HTML documents. # # == Synopsis: # # builder = Nokogiri::XML::Builder.new do |xml| # xml.root { # xml.products { # xml.widget { # xml.id_ "10" # xml.name "Awesome widget" # } # } # } # end # puts builder.to_xml # # Will output: # # # # # # 10 # Awesome widget # # # # # # === Builder scope # # The builder allows two forms. When the builder is supplied with a block # that has a parameter, the outside scope is maintained. This means you # can access variables that are outside your builder. If you don't need # outside scope, you can use the builder without the "xml" prefix like # this: # # builder = Nokogiri::XML::Builder.new do # root { # products { # widget { # id_ "10" # name "Awesome widget" # } # } # } # end # # == Special Tags # # The builder works by taking advantage of method_missing. Unfortunately # some methods are defined in ruby that are difficult or dangerous to # remove. You may want to create tags with the name "type", "class", and # "id" for example. In that case, you can use an underscore to # disambiguate your tag name from the method call. # # Here is an example of using the underscore to disambiguate tag names from # ruby methods: # # @objects = [Object.new, Object.new, Object.new] # # builder = Nokogiri::XML::Builder.new do |xml| # xml.root { # xml.objects { # @objects.each do |o| # xml.object { # xml.type_ o.type # xml.class_ o.class.name # xml.id_ o.id # } # end # } # } # end # puts builder.to_xml # # The underscore may be used with any tag name, and the last underscore # will just be removed. This code will output the following XML: # # # # # # Object # Object # 48390 # # # Object # Object # 48380 # # # Object # Object # 48370 # # # # # == Tag Attributes # # Tag attributes may be supplied as method arguments. Here is our # previous example, but using attributes rather than tags: # # @objects = [Object.new, Object.new, Object.new] # # builder = Nokogiri::XML::Builder.new do |xml| # xml.root { # xml.objects { # @objects.each do |o| # xml.object(:type => o.type, :class => o.class, :id => o.id) # end # } # } # end # puts builder.to_xml # # === Tag Attribute Short Cuts # # A couple attribute short cuts are available when building tags. The # short cuts are available by special method calls when building a tag. # # This example builds an "object" tag with the class attribute "classy" # and the id of "thing": # # builder = Nokogiri::XML::Builder.new do |xml| # xml.root { # xml.objects { # xml.object.classy.thing! # } # } # end # puts builder.to_xml # # Which will output: # # # # # # # # # All other options are still supported with this syntax, including # blocks and extra tag attributes. # # == Namespaces # # Namespaces are added similarly to attributes. Nokogiri::XML::Builder # assumes that when an attribute starts with "xmlns", it is meant to be # a namespace: # # builder = Nokogiri::XML::Builder.new { |xml| # xml.root('xmlns' => 'default', 'xmlns:foo' => 'bar') do # xml.tenderlove # end # } # puts builder.to_xml # # Will output XML like this: # # # # # # # === Referencing declared namespaces # # Tags that reference non-default namespaces (i.e. a tag "foo:bar") can be # built by using the Nokogiri::XML::Builder#[] method. # # For example: # # builder = Nokogiri::XML::Builder.new do |xml| # xml.root('xmlns:foo' => 'bar') { # xml.objects { # xml['foo'].object.classy.thing! # } # } # end # puts builder.to_xml # # Will output this XML: # # # # # # # # # Note the "foo:object" tag. # # == Document Types # # To create a document type (DTD), access use the Builder#doc method to get # the current context document. Then call Node#create_internal_subset to # create the DTD node. # # For example, this Ruby: # # builder = Nokogiri::XML::Builder.new do |xml| # xml.doc.create_internal_subset( # 'html', # "-//W3C//DTD HTML 4.01 Transitional//EN", # "http://www.w3.org/TR/html4/loose.dtd" # ) # xml.root do # xml.foo # end # end # # puts builder.to_xml # # Will output this xml: # # # # # # # class Builder # The current Document object being built attr_accessor :doc # The parent of the current node being built attr_accessor :parent # A context object for use when the block has no arguments attr_accessor :context attr_accessor :arity # :nodoc: ### # Create a builder with an existing root object. This is for use when # you have an existing document that you would like to augment with # builder methods. The builder context created will start with the # given +root+ node. # # For example: # # doc = Nokogiri::XML(open('somedoc.xml')) # Nokogiri::XML::Builder.with(doc.at('some_tag')) do |xml| # # ... Use normal builder methods here ... # xml.awesome # add the "awesome" tag below "some_tag" # end # def self.with root, &block new({}, root, &block) end ### # Create a new Builder object. +options+ are sent to the top level # Document that is being built. # # Building a document with a particular encoding for example: # # Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml| # ... # end def initialize options = {}, root = nil, &block if root @doc = root.document @parent = root else namespace = self.class.name.split('::') namespace[-1] = 'Document' @doc = eval(namespace.join('::')).new @parent = @doc end @context = nil @arity = nil @ns = nil options.each do |k,v| @doc.send(:"#{k}=", v) end return unless block_given? @arity = block.arity if @arity <= 0 @context = eval('self', block.binding) instance_eval(&block) else yield self end @parent = @doc end ### # Create a Text Node with content of +string+ def text string insert @doc.create_text_node(string) end ### # Create a CDATA Node with content of +string+ def cdata string insert doc.create_cdata(string) end ### # Create a Comment Node with content of +string+ def comment string insert doc.create_comment(string) end ### # Build a tag that is associated with namespace +ns+. Raises an # ArgumentError if +ns+ has not been defined higher in the tree. def [] ns if @parent != @doc @ns = @parent.namespace_definitions.find { |x| x.prefix == ns.to_s } end return self if @ns @parent.ancestors.each do |a| next if a == doc @ns = a.namespace_definitions.find { |x| x.prefix == ns.to_s } return self if @ns end @ns = { :pending => ns.to_s } return self end ### # Convert this Builder object to XML def to_xml(*args) if Nokogiri.jruby? options = args.first.is_a?(Hash) ? args.shift : {} if !options[:save_with] options[:save_with] = Node::SaveOptions::AS_BUILDER end args.insert(0, options) end @doc.to_xml(*args) end ### # Append the given raw XML +string+ to the document def << string @doc.fragment(string).children.each { |x| insert(x) } end def method_missing method, *args, &block # :nodoc: if @context && @context.respond_to?(method) @context.send(method, *args, &block) else node = @doc.create_element(method.to_s.sub(/[_!]$/, ''),*args) { |n| # Set up the namespace if @ns.is_a? Nokogiri::XML::Namespace n.namespace = @ns @ns = nil end } if @ns.is_a? Hash node.namespace = node.namespace_definitions.find { |x| x.prefix == @ns[:pending] } if node.namespace.nil? raise ArgumentError, "Namespace #{@ns[:pending]} has not been defined" end @ns = nil end insert(node, &block) end end private ### # Insert +node+ as a child of the current Node def insert(node, &block) node.parent = @parent if block_given? old_parent = @parent @parent = node @arity ||= block.arity if @arity <= 0 instance_eval(&block) else block.call(self) end @parent = old_parent end NodeBuilder.new(node, self) end class NodeBuilder # :nodoc: def initialize node, doc_builder @node = node @doc_builder = doc_builder end def []= k, v @node[k] = v end def [] k @node[k] end def method_missing(method, *args, &block) opts = args.last.is_a?(Hash) ? args.pop : {} case method.to_s when /^(.*)!$/ @node['id'] = $1 @node.content = args.first if args.first when /^(.*)=/ @node[$1] = args.first else @node['class'] = ((@node['class'] || '').split(/\s/) + [method.to_s]).join(' ') @node.content = args.first if args.first end # Assign any extra options opts.each do |k,v| @node[k.to_s] = ((@node[k.to_s] || '').split(/\s/) + [v]).join(' ') end if block_given? old_parent = @doc_builder.parent @doc_builder.parent = @node value = @doc_builder.instance_eval(&block) @doc_builder.parent = old_parent return value end self end end end end end nokogiri-1.6.1/lib/nokogiri/xml/sax.rb0000644000175000017500000000022712261213762017230 0ustar boutilboutilrequire 'nokogiri/xml/sax/document' require 'nokogiri/xml/sax/parser_context' require 'nokogiri/xml/sax/parser' require 'nokogiri/xml/sax/push_parser' nokogiri-1.6.1/lib/nokogiri/xml/xpath.rb0000644000175000017500000000031012261213762017552 0ustar boutilboutilrequire 'nokogiri/xml/xpath/syntax_error' module Nokogiri module XML class XPath # The Nokogiri::XML::Document tied to this XPath instance attr_accessor :document end end end nokogiri-1.6.1/lib/nokogiri/xml/xpath/0000755000175000017500000000000012261213762017233 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/xml/xpath/syntax_error.rb0000644000175000017500000000030212261213762022312 0ustar boutilboutilmodule Nokogiri module XML class XPath class SyntaxError < XML::SyntaxError def to_s [super.chomp, str1].compact.join(': ') end end end end end nokogiri-1.6.1/lib/nokogiri/xml/cdata.rb0000644000175000017500000000027112261213762017510 0ustar boutilboutilmodule Nokogiri module XML class CDATA < Nokogiri::XML::Text ### # Get the name of this CDATA node def name '#cdata-section' end end end end nokogiri-1.6.1/lib/nokogiri/xml/element_content.rb0000644000175000017500000000152612261213762021623 0ustar boutilboutilmodule Nokogiri module XML ### # Represents the allowed content in an Element Declaration inside a DTD: # # # # ]> # # # ElementContent represents the tree inside the tag shown above # that lists the possible content for the div1 tag. class ElementContent # Possible definitions of type PCDATA = 1 ELEMENT = 2 SEQ = 3 OR = 4 # Possible content occurrences ONCE = 1 OPT = 2 MULT = 3 PLUS = 4 attr_reader :document ### # Get the children of this ElementContent node def children [c1, c2].compact end end end end nokogiri-1.6.1/lib/nokogiri/xml/node.rb0000644000175000017500000007561212261213762017374 0ustar boutilboutilrequire 'stringio' require 'nokogiri/xml/node/save_options' module Nokogiri module XML #### # Nokogiri::XML::Node is your window to the fun filled world of dealing # with XML and HTML tags. A Nokogiri::XML::Node may be treated similarly # to a hash with regard to attributes. For example (from irb): # # irb(main):004:0> node # => link # irb(main):005:0> node['href'] # => "#foo" # irb(main):006:0> node.keys # => ["href", "id"] # irb(main):007:0> node.values # => ["#foo", "link"] # irb(main):008:0> node['class'] = 'green' # => "green" # irb(main):009:0> node # => link # irb(main):010:0> # # See Nokogiri::XML::Node#[] and Nokogiri::XML#[]= for more information. # # Nokogiri::XML::Node also has methods that let you move around your # tree. For navigating your tree, see: # # * Nokogiri::XML::Node#parent # * Nokogiri::XML::Node#children # * Nokogiri::XML::Node#next # * Nokogiri::XML::Node#previous # # You may search this node's subtree using Node#xpath and Node#css class Node include Nokogiri::XML::PP::Node include Enumerable # Element node type, see Nokogiri::XML::Node#element? ELEMENT_NODE = 1 # Attribute node type ATTRIBUTE_NODE = 2 # Text node type, see Nokogiri::XML::Node#text? TEXT_NODE = 3 # CDATA node type, see Nokogiri::XML::Node#cdata? CDATA_SECTION_NODE = 4 # Entity reference node type ENTITY_REF_NODE = 5 # Entity node type ENTITY_NODE = 6 # PI node type PI_NODE = 7 # Comment node type, see Nokogiri::XML::Node#comment? COMMENT_NODE = 8 # Document node type, see Nokogiri::XML::Node#xml? DOCUMENT_NODE = 9 # Document type node type DOCUMENT_TYPE_NODE = 10 # Document fragment node type DOCUMENT_FRAG_NODE = 11 # Notation node type NOTATION_NODE = 12 # HTML document node type, see Nokogiri::XML::Node#html? HTML_DOCUMENT_NODE = 13 # DTD node type DTD_NODE = 14 # Element declaration type ELEMENT_DECL = 15 # Attribute declaration type ATTRIBUTE_DECL = 16 # Entity declaration type ENTITY_DECL = 17 # Namespace declaration type NAMESPACE_DECL = 18 # XInclude start type XINCLUDE_START = 19 # XInclude end type XINCLUDE_END = 20 # DOCB document node type DOCB_DOCUMENT_NODE = 21 def initialize name, document # :nodoc: # ... Ya. This is empty on purpose. end ### # Decorate this node with the decorators set up in this node's Document def decorate! document.decorate(self) end ### # Search this node for +paths+. +paths+ can be XPath or CSS, and an # optional hash of namespaces may be appended. # See Node#xpath and Node#css. def search *paths # TODO use paths, handler, ns, binds = extract_params(paths) ns = paths.last.is_a?(Hash) ? paths.pop : (document.root ? document.root.namespaces : {}) prefix = "#{implied_xpath_context}/" xpath(*(paths.map { |path| path = path.to_s path =~ /^(\.\/|\/|\.\.|\.$)/ ? path : CSS.xpath_for( path, :prefix => prefix, :ns => ns ) }.flatten.uniq) + [ns]) end alias :/ :search ### # call-seq: xpath *paths, [namespace-bindings, variable-bindings, custom-handler-class] # # Search this node for XPath +paths+. +paths+ must be one or more XPath # queries. # # node.xpath('.//title') # # A hash of namespace bindings may be appended. For example: # # node.xpath('.//foo:name', {'foo' => 'http://example.org/'}) # node.xpath('.//xmlns:name', node.root.namespaces) # # A hash of variable bindings may also be appended to the namespace bindings. For example: # # node.xpath('.//address[@domestic=$value]', nil, {:value => 'Yes'}) # # Custom XPath functions may also be defined. To define custom # functions create a class and implement the function you want # to define. The first argument to the method will be the # current matching NodeSet. Any other arguments are ones that # you pass in. Note that this class may appear anywhere in the # argument list. For example: # # node.xpath('.//title[regex(., "\w+")]', Class.new { # def regex node_set, regex # node_set.find_all { |node| node['some_attribute'] =~ /#{regex}/ } # end # }.new) # def xpath *paths return NodeSet.new(document) unless document paths, handler, ns, binds = extract_params(paths) sets = paths.map { |path| ctx = XPathContext.new(self) ctx.register_namespaces(ns) path = path.gsub(/xmlns:/, ' :') unless Nokogiri.uses_libxml? binds.each do |key,value| ctx.register_variable key.to_s, value end if binds ctx.evaluate(path, handler) } return sets.first if sets.length == 1 NodeSet.new(document) do |combined| sets.each do |set| set.each do |node| combined << node end end end end ### # call-seq: css *rules, [namespace-bindings, custom-pseudo-class] # # Search this node for CSS +rules+. +rules+ must be one or more CSS # selectors. For example: # # node.css('title') # node.css('body h1.bold') # node.css('div + p.green', 'div#one') # # A hash of namespace bindings may be appended. For example: # # node.css('bike|tire', {'bike' => 'http://schwinn.com/'}) # # Custom CSS pseudo classes may also be defined. To define # custom pseudo classes, create a class and implement the custom # pseudo class you want defined. The first argument to the # method will be the current matching NodeSet. Any other # arguments are ones that you pass in. For example: # # node.css('title:regex("\w+")', Class.new { # def regex node_set, regex # node_set.find_all { |node| node['some_attribute'] =~ /#{regex}/ } # end # }.new) # # Note that the CSS query string is case-sensitive with regards # to your document type. That is, if you're looking for "H1" in # an HTML document, you'll never find anything, since HTML tags # will match only lowercase CSS queries. However, "H1" might be # found in an XML document, where tags names are case-sensitive # (e.g., "H1" is distinct from "h1"). # def css *rules rules, handler, ns, binds = extract_params(rules) prefix = "#{implied_xpath_context}/" rules = rules.map { |rule| CSS.xpath_for(rule, :prefix => prefix, :ns => ns) }.flatten.uniq + [ns, handler, binds].compact xpath(*rules) end ### # Search this node's immediate children using CSS selector +selector+ def > selector ns = document.root.namespaces xpath CSS.xpath_for(selector, :prefix => "./", :ns => ns).first end ### # Search for the first occurrence of +path+. # # Returns nil if nothing is found, otherwise a Node. def at path, ns = document.root ? document.root.namespaces : {} search(path, ns).first end alias :% :at ## # Search this node for the first occurrence of XPath +paths+. # Equivalent to xpath(paths).first # See Node#xpath for more information. # def at_xpath *paths xpath(*paths).first end ## # Search this node for the first occurrence of CSS +rules+. # Equivalent to css(rules).first # See Node#css for more information. # def at_css *rules css(*rules).first end ### # Get the attribute value for the attribute +name+ def [] name get(name.to_s) end ### # Set the attribute value for the attribute +name+ to +value+ def []= name, value set name.to_s, value.to_s end ### # Add +node_or_tags+ as a child of this Node. # +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a string containing markup. # # Returns the reparented node (if +node_or_tags+ is a Node), or NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string). # # Also see related method +<<+. def add_child node_or_tags node_or_tags = coerce(node_or_tags) if node_or_tags.is_a?(XML::NodeSet) node_or_tags.each { |n| add_child_node_and_reparent_attrs n } else add_child_node_and_reparent_attrs node_or_tags end node_or_tags end ### # Add +node_or_tags+ as a child of this Node. # +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a string containing markup. # # Returns self, to support chaining of calls (e.g., root << child1 << child2) # # Also see related method +add_child+. def << node_or_tags add_child node_or_tags self end ### # Insert +node_or_tags+ before this Node (as a sibling). # +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a string containing markup. # # Returns the reparented node (if +node_or_tags+ is a Node), or NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string). # # Also see related method +before+. def add_previous_sibling node_or_tags raise ArgumentError.new("A document may not have multiple root nodes.") if parent.is_a?(XML::Document) && !node_or_tags.is_a?(XML::ProcessingInstruction) add_sibling :previous, node_or_tags end ### # Insert +node_or_tags+ after this Node (as a sibling). # +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a string containing markup. # # Returns the reparented node (if +node_or_tags+ is a Node), or NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string). # # Also see related method +after+. def add_next_sibling node_or_tags raise ArgumentError.new("A document may not have multiple root nodes.") if parent.is_a?(XML::Document) add_sibling :next, node_or_tags end #### # Insert +node_or_tags+ before this node (as a sibling). # +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a string containing markup. # # Returns self, to support chaining of calls. # # Also see related method +add_previous_sibling+. def before node_or_tags add_previous_sibling node_or_tags self end #### # Insert +node_or_tags+ after this node (as a sibling). # +node_or_tags+ can be a Nokogiri::XML::Node, a Nokogiri::XML::DocumentFragment, or a string containing markup. # # Returns self, to support chaining of calls. # # Also see related method +add_next_sibling+. def after node_or_tags add_next_sibling node_or_tags self end #### # Set the inner html for this Node to +node_or_tags+ # +node_or_tags+ can be a Nokogiri::XML::Node, a Nokogiri::XML::DocumentFragment, or a string containing markup. # # Returns self. # # Also see related method +children=+ def inner_html= node_or_tags self.children = node_or_tags self end #### # Set the inner html for this Node +node_or_tags+ # +node_or_tags+ can be a Nokogiri::XML::Node, a Nokogiri::XML::DocumentFragment, or a string containing markup. # # Returns the reparented node (if +node_or_tags+ is a Node), or NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string). # # Also see related method +inner_html=+ def children= node_or_tags node_or_tags = coerce(node_or_tags) children.unlink if node_or_tags.is_a?(XML::NodeSet) node_or_tags.each { |n| add_child_node_and_reparent_attrs n } else add_child_node_and_reparent_attrs node_or_tags end node_or_tags end #### # Replace this Node with +node_or_tags+. # +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a string containing markup. # # Returns the reparented node (if +node_or_tags+ is a Node), or NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string). # # Also see related method +swap+. def replace node_or_tags # We cannot replace a text node directly, otherwise libxml will return # an internal error at parser.c:13031, I don't know exactly why # libxml is trying to find a parent node that is an element or document # so I can't tell if this is bug in libxml or not. issue #775. if text? replacee = Nokogiri::XML::Node.new 'dummy', document add_previous_sibling_node replacee unlink return replacee.replace node_or_tags end node_or_tags = coerce(node_or_tags) if node_or_tags.is_a?(XML::NodeSet) node_or_tags.each { |n| add_previous_sibling n } unlink else replace_node node_or_tags end node_or_tags end #### # Swap this Node for +node_or_tags+ # +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a string containing markup. # # Returns self, to support chaining of calls. # # Also see related method +replace+. def swap node_or_tags replace node_or_tags self end alias :next :next_sibling alias :previous :previous_sibling # :stopdoc: # HACK: This is to work around an RDoc bug alias :next= :add_next_sibling # :startdoc: alias :previous= :add_previous_sibling alias :remove :unlink alias :get_attribute :[] alias :attr :[] alias :set_attribute :[]= alias :text :content alias :inner_text :content alias :has_attribute? :key? alias :name :node_name alias :name= :node_name= alias :type :node_type alias :to_str :text alias :clone :dup alias :elements :element_children #### # Returns a hash containing the node's attributes. The key is # the attribute name without any namespace, the value is a Nokogiri::XML::Attr # representing the attribute. # If you need to distinguish attributes with the same name, with different namespaces # use #attribute_nodes instead. def attributes Hash[attribute_nodes.map { |node| [node.node_name, node] }] end ### # Get the attribute values for this Node. def values attribute_nodes.map { |node| node.value } end ### # Get the attribute names for this Node. def keys attribute_nodes.map { |node| node.node_name } end ### # Iterate over each attribute name and value pair for this Node. def each attribute_nodes.each { |node| yield [node.node_name, node.value] } end ### # Remove the attribute named +name+ def remove_attribute name attributes[name].remove if key? name end alias :delete :remove_attribute ### # Returns true if this Node matches +selector+ def matches? selector ancestors.last.search(selector).include?(self) end ### # Create a DocumentFragment containing +tags+ that is relative to _this_ # context node. def fragment tags type = document.html? ? Nokogiri::HTML : Nokogiri::XML type::DocumentFragment.new(document, tags, self) end ### # Parse +string_or_io+ as a document fragment within the context of # *this* node. Returns a XML::NodeSet containing the nodes parsed from # +string_or_io+. def parse string_or_io, options = nil options ||= (document.html? ? ParseOptions::DEFAULT_HTML : ParseOptions::DEFAULT_XML) if Fixnum === options options = Nokogiri::XML::ParseOptions.new(options) end # Give the options to the user yield options if block_given? contents = string_or_io.respond_to?(:read) ? string_or_io.read : string_or_io return Nokogiri::XML::NodeSet.new(document) if contents.empty? ## # This is a horrible hack, but I don't care. See #313 for background. error_count = document.errors.length node_set = in_context(contents, options.to_i) if node_set.empty? and document.errors.length > error_count and options.recover? fragment = Nokogiri::HTML::DocumentFragment.parse contents node_set = fragment.children end node_set end #### # Set the Node's content to a Text node containing +string+. The string gets XML escaped, not interpreted as markup. def content= string self.native_content = encode_special_chars(string.to_s) end ### # Set the parent Node for this Node def parent= parent_node parent_node.add_child(self) parent_node end ### # Returns a Hash of {prefix => value} for all namespaces on this # node and its ancestors. # # This method returns the same namespaces as #namespace_scopes. # # Returns namespaces in scope for self -- those defined on self # element directly or any ancestor node -- as a Hash of # attribute-name/value pairs. Note that the keys in this hash # XML attributes that would be used to define this namespace, # such as "xmlns:prefix", not just the prefix. Default namespace # set on self will be included with key "xmlns". However, # default namespaces set on ancestor will NOT be, even if self # has no explicit default namespace. def namespaces Hash[namespace_scopes.map { |nd| key = ['xmlns', nd.prefix].compact.join(':') if RUBY_VERSION >= '1.9' && document.encoding begin key.force_encoding document.encoding rescue ArgumentError end end [key, nd.href] }] end # Returns true if this is a Comment def comment? type == COMMENT_NODE end # Returns true if this is a CDATA def cdata? type == CDATA_SECTION_NODE end # Returns true if this is an XML::Document node def xml? type == DOCUMENT_NODE end # Returns true if this is an HTML::Document node def html? type == HTML_DOCUMENT_NODE end # Returns true if this is a Text node def text? type == TEXT_NODE end # Returns true if this is a DocumentFragment def fragment? type == DOCUMENT_FRAG_NODE end ### # Fetch the Nokogiri::HTML::ElementDescription for this node. Returns # nil on XML documents and on unknown tags. def description return nil if document.xml? Nokogiri::HTML::ElementDescription[name] end ### # Is this a read only node? def read_only? # According to gdome2, these are read-only node types [NOTATION_NODE, ENTITY_NODE, ENTITY_DECL].include?(type) end # Returns true if this is an Element node def element? type == ELEMENT_NODE end alias :elem? :element? ### # Turn this node in to a string. If the document is HTML, this method # returns html. If the document is XML, this method returns XML. def to_s document.xml? ? to_xml : to_html end # Get the inner_html for this node's Node#children def inner_html *args children.map { |x| x.to_html(*args) }.join end # Get the path to this node as a CSS expression def css_path path.split(/\//).map { |part| part.length == 0 ? nil : part.gsub(/\[(\d+)\]/, ':nth-of-type(\1)') }.compact.join(' > ') end ### # Get a list of ancestor Node for this Node. If +selector+ is given, # the ancestors must match +selector+ def ancestors selector = nil return NodeSet.new(document) unless respond_to?(:parent) return NodeSet.new(document) unless parent parents = [parent] while parents.last.respond_to?(:parent) break unless ctx_parent = parents.last.parent parents << ctx_parent end return NodeSet.new(document, parents) unless selector root = parents.last NodeSet.new(document, parents.find_all { |parent| root.search(selector).include?(parent) }) end ### # Adds a default namespace supplied as a string +url+ href, to self. # The consequence is as an xmlns attribute with supplied argument were # present in parsed XML. A default namespace set with this method will # now show up in #attributes, but when this node is serialized to XML an # "xmlns" attribute will appear. See also #namespace and #namespace= def default_namespace= url add_namespace_definition(nil, url) end alias :add_namespace :add_namespace_definition ### # Set the default namespace on this node (as would be defined with an # "xmlns=" attribute in XML source), as a Namespace object +ns+. Note that # a Namespace added this way will NOT be serialized as an xmlns attribute # for this node. You probably want #default_namespace= instead, or perhaps # #add_namespace_definition with a nil prefix argument. def namespace= ns return set_namespace(ns) unless ns unless Nokogiri::XML::Namespace === ns raise TypeError, "#{ns.class} can't be coerced into Nokogiri::XML::Namespace" end if ns.document != document raise ArgumentError, 'namespace must be declared on the same document' end set_namespace ns end #### # Yields self and all children to +block+ recursively. def traverse &block children.each{|j| j.traverse(&block) } block.call(self) end ### # Accept a visitor. This method calls "visit" on +visitor+ with self. def accept visitor visitor.visit(self) end ### # Test to see if this Node is equal to +other+ def == other return false unless other return false unless other.respond_to?(:pointer_id) pointer_id == other.pointer_id end ### # Serialize Node using +options+. Save options can also be set using a # block. See SaveOptions. # # These two statements are equivalent: # # node.serialize(:encoding => 'UTF-8', :save_with => FORMAT | AS_XML) # # or # # node.serialize(:encoding => 'UTF-8') do |config| # config.format.as_xml # end # def serialize *args, &block options = args.first.is_a?(Hash) ? args.shift : { :encoding => args[0], :save_with => args[1] } encoding = options[:encoding] || document.encoding options[:encoding] = encoding outstring = "" if encoding && outstring.respond_to?(:force_encoding) outstring.force_encoding(Encoding.find(encoding)) end io = StringIO.new(outstring) write_to io, options, &block io.string end ### # Serialize this Node to HTML # # doc.to_html # # See Node#write_to for a list of +options+. For formatted output, # use Node#to_xhtml instead. def to_html options = {} to_format SaveOptions::DEFAULT_HTML, options end ### # Serialize this Node to XML using +options+ # # doc.to_xml(:indent => 5, :encoding => 'UTF-8') # # See Node#write_to for a list of +options+ def to_xml options = {} options[:save_with] ||= SaveOptions::DEFAULT_XML serialize(options) end ### # Serialize this Node to XHTML using +options+ # # doc.to_xhtml(:indent => 5, :encoding => 'UTF-8') # # See Node#write_to for a list of +options+ def to_xhtml options = {} to_format SaveOptions::DEFAULT_XHTML, options end ### # Write Node to +io+ with +options+. +options+ modify the output of # this method. Valid options are: # # * +:encoding+ for changing the encoding # * +:indent_text+ the indentation text, defaults to one space # * +:indent+ the number of +:indent_text+ to use, defaults to 2 # * +:save_with+ a combination of SaveOptions constants. # # To save with UTF-8 indented twice: # # node.write_to(io, :encoding => 'UTF-8', :indent => 2) # # To save indented with two dashes: # # node.write_to(io, :indent_text => '-', :indent => 2 # def write_to io, *options options = options.first.is_a?(Hash) ? options.shift : {} encoding = options[:encoding] || options[0] if Nokogiri.jruby? save_options = options[:save_with] || options[1] indent_times = options[:indent] || 0 else save_options = options[:save_with] || options[1] || SaveOptions::FORMAT indent_times = options[:indent] || 2 end indent_text = options[:indent_text] || ' ' config = SaveOptions.new(save_options.to_i) yield config if block_given? native_write_to(io, encoding, indent_text * indent_times, config.options) end ### # Write Node as HTML to +io+ with +options+ # # See Node#write_to for a list of +options+ def write_html_to io, options = {} write_format_to SaveOptions::DEFAULT_HTML, io, options end ### # Write Node as XHTML to +io+ with +options+ # # See Node#write_to for a list of +options+ def write_xhtml_to io, options = {} write_format_to SaveOptions::DEFAULT_XHTML, io, options end ### # Write Node as XML to +io+ with +options+ # # doc.write_xml_to io, :encoding => 'UTF-8' # # See Node#write_to for a list of options def write_xml_to io, options = {} options[:save_with] ||= SaveOptions::DEFAULT_XML write_to io, options end ### # Compare two Node objects with respect to their Document. Nodes from # different documents cannot be compared. def <=> other return nil unless other.is_a?(Nokogiri::XML::Node) return nil unless document == other.document compare other end ### # Do xinclude substitution on the subtree below node. If given a block, a # Nokogiri::XML::ParseOptions object initialized from +options+, will be # passed to it, allowing more convenient modification of the parser options. def do_xinclude options = XML::ParseOptions::DEFAULT_XML, &block options = Nokogiri::XML::ParseOptions.new(options) if Fixnum === options # give options to user yield options if block_given? # call c extension process_xincludes(options.to_i) end def canonicalize(mode=XML::XML_C14N_1_0,inclusive_namespaces=nil,with_comments=false) c14n_root = self document.canonicalize(mode, inclusive_namespaces, with_comments) do |node, parent| tn = node.is_a?(XML::Node) ? node : parent tn == c14n_root || tn.ancestors.include?(c14n_root) end end private def add_sibling next_or_previous, node_or_tags impl = (next_or_previous == :next) ? :add_next_sibling_node : :add_previous_sibling_node iter = (next_or_previous == :next) ? :reverse_each : :each node_or_tags = coerce node_or_tags if node_or_tags.is_a?(XML::NodeSet) if text? pivot = Nokogiri::XML::Node.new 'dummy', document send impl, pivot else pivot = self end node_or_tags.send(iter) { |n| pivot.send impl, n } pivot.unlink if text? else send impl, node_or_tags end node_or_tags end def to_format save_option, options # FIXME: this is a hack around broken libxml versions return dump_html if Nokogiri.uses_libxml? && %w[2 6] === LIBXML_VERSION.split('.')[0..1] options[:save_with] |= save_option if options[:save_with] options[:save_with] = save_option unless options[:save_with] serialize(options) end def write_format_to save_option, io, options # FIXME: this is a hack around broken libxml versions return (io << dump_html) if Nokogiri.uses_libxml? && %w[2 6] === LIBXML_VERSION.split('.')[0..1] options[:save_with] ||= save_option write_to io, options end def extract_params params # :nodoc: # Pop off our custom function handler if it exists handler = params.find { |param| ![Hash, String, Symbol].include?(param.class) } params -= [handler] if handler hashes = [] while Hash === params.last || params.last.nil? hashes << params.pop break if params.empty? end ns, binds = hashes.reverse ns ||= document.root ? document.root.namespaces : {} [params, handler, ns, binds] end def coerce data # :nodoc: case data when XML::NodeSet return data when XML::DocumentFragment return data.children when String return fragment(data).children when Document, XML::Attr # unacceptable when XML::Node return data end raise ArgumentError, <<-EOERR Requires a Node, NodeSet or String argument, and cannot accept a #{data.class}. (You probably want to select a node from the Document with at() or search(), or create a new Node via Node.new().) EOERR end def implied_xpath_context "./" end def inspect_attributes [:name, :namespace, :attribute_nodes, :children] end def add_child_node_and_reparent_attrs node add_child_node node node.attribute_nodes.find_all { |a| a.name =~ /:/ }.each do |attr_node| attr_node.remove node[attr_node.name] = attr_node.value end end end end end nokogiri-1.6.1/lib/nokogiri/xml/xpath_context.rb0000644000175000017500000000047712261213762021334 0ustar boutilboutilmodule Nokogiri module XML class XPathContext ### # Register namespaces in +namespaces+ def register_namespaces(namespaces) namespaces.each do |k, v| k = k.to_s.gsub(/.*:/,'') # strip off 'xmlns:' or 'xml:' register_ns(k, v) end end end end end nokogiri-1.6.1/lib/nokogiri/xml/node_set.rb0000644000175000017500000002223312261213762020236 0ustar boutilboutilmodule Nokogiri module XML #### # A NodeSet contains a list of Nokogiri::XML::Node objects. Typically # a NodeSet is return as a result of searching a Document via # Nokogiri::XML::Node#css or Nokogiri::XML::Node#xpath class NodeSet include Enumerable # The Document this NodeSet is associated with attr_accessor :document # Create a NodeSet with +document+ defaulting to +list+ def initialize document, list = [] @document = document document.decorate(self) list.each { |x| self << x } yield self if block_given? end ### # Get the first element of the NodeSet. def first n = nil return self[0] unless n list = [] n.times { |i| list << self[i] } list end ### # Get the last element of the NodeSet. def last self[-1] end ### # Is this NodeSet empty? def empty? length == 0 end ### # Returns the index of the first node in self that is == to +node+. Returns nil if no match is found. def index(node) each_with_index { |member, j| return j if member == node } nil end ### # Insert +datum+ before the first Node in this NodeSet def before datum first.before datum end ### # Insert +datum+ after the last Node in this NodeSet def after datum last.after datum end alias :<< :push alias :remove :unlink ### # Search this document for +paths+ # # For more information see Nokogiri::XML::Node#css and # Nokogiri::XML::Node#xpath def search *paths handler = ![ Hash, String, Symbol ].include?(paths.last.class) ? paths.pop : nil ns = paths.last.is_a?(Hash) ? paths.pop : nil sub_set = NodeSet.new(document) paths.each do |path| sub_set += send( path =~ /^(\.\/|\/|\.\.|\.$)/ ? :xpath : :css, *(paths + [ns, handler]).compact ) end document.decorate(sub_set) sub_set end alias :/ :search ### # Search this NodeSet for css +paths+ # # For more information see Nokogiri::XML::Node#css def css *paths handler = ![ Hash, String, Symbol ].include?(paths.last.class) ? paths.pop : nil ns = paths.last.is_a?(Hash) ? paths.pop : nil sub_set = NodeSet.new(document) each do |node| doc = node.document search_ns = ns || (doc.root ? doc.root.namespaces : {}) xpaths = paths.map { |rule| [ CSS.xpath_for(rule.to_s, :prefix => ".//", :ns => search_ns), CSS.xpath_for(rule.to_s, :prefix => "self::", :ns => search_ns) ].join(' | ') } sub_set += node.xpath(*(xpaths + [search_ns, handler].compact)) end document.decorate(sub_set) sub_set end ### # Search this NodeSet for XPath +paths+ # # For more information see Nokogiri::XML::Node#xpath def xpath *paths handler = ![ Hash, String, Symbol ].include?(paths.last.class) ? paths.pop : nil ns = paths.last.is_a?(Hash) ? paths.pop : nil sub_set = NodeSet.new(document) each do |node| sub_set += node.xpath(*(paths + [ns, handler].compact)) end document.decorate(sub_set) sub_set end ### # Search this NodeSet's nodes' immediate children using CSS selector +selector+ def > selector ns = document.root.namespaces xpath CSS.xpath_for(selector, :prefix => "./", :ns => ns).first end ### # If path is a string, search this document for +path+ returning the # first Node. Otherwise, index in to the array with +path+. def at path, ns = document.root ? document.root.namespaces : {} return self[path] if path.is_a?(Numeric) search(path, ns).first end alias :% :at ## # Search this NodeSet for the first occurrence of XPath +paths+. # Equivalent to xpath(paths).first # See NodeSet#xpath for more information. # def at_xpath *paths xpath(*paths).first end ## # Search this NodeSet for the first occurrence of CSS +rules+. # Equivalent to css(rules).first # See NodeSet#css for more information. # def at_css *rules css(*rules).first end ### # Filter this list for nodes that match +expr+ def filter expr find_all { |node| node.matches?(expr) } end ### # Append the class attribute +name+ to all Node objects in the NodeSet. def add_class name each do |el| classes = el['class'].to_s.split(/\s+/) el['class'] = classes.push(name).uniq.join " " end self end ### # Remove the class attribute +name+ from all Node objects in the NodeSet. # If +name+ is nil, remove the class attribute from all Nodes in the # NodeSet. def remove_class name = nil each do |el| if name classes = el['class'].to_s.split(/\s+/) if classes.empty? el.delete 'class' else el['class'] = (classes - [name]).uniq.join " " end else el.delete "class" end end self end ### # Set the attribute +key+ to +value+ or the return value of +blk+ # on all Node objects in the NodeSet. def attr key, value = nil, &blk unless Hash === key || key && (value || blk) return first.attribute(key) end hash = key.is_a?(Hash) ? key : { key => value } hash.each { |k,v| each { |el| el[k] = v || blk[el] } } self end alias :set :attr alias :attribute :attr ### # Remove the attributed named +name+ from all Node objects in the NodeSet def remove_attr name each { |el| el.delete name } self end ### # Iterate over each node, yielding to +block+ def each(&block) 0.upto(length - 1) do |x| yield self[x] end end ### # Get the inner text of all contained Node objects def inner_text collect{|j| j.inner_text}.join('') end alias :text :inner_text ### # Get the inner html of all contained Node objects def inner_html *args collect{|j| j.inner_html(*args) }.join('') end ### # Wrap this NodeSet with +html+ or the results of the builder in +blk+ def wrap(html, &blk) each do |j| new_parent = document.parse(html).first j.add_next_sibling(new_parent) new_parent.add_child(j) end self end ### # Convert this NodeSet to a string. def to_s map { |x| x.to_s }.join end ### # Convert this NodeSet to HTML def to_html *args if Nokogiri.jruby? options = args.first.is_a?(Hash) ? args.shift : {} if !options[:save_with] options[:save_with] = Node::SaveOptions::NO_DECLARATION | Node::SaveOptions::NO_EMPTY_TAGS | Node::SaveOptions::AS_HTML end args.insert(0, options) end map { |x| x.to_html(*args) }.join end ### # Convert this NodeSet to XHTML def to_xhtml *args map { |x| x.to_xhtml(*args) }.join end ### # Convert this NodeSet to XML def to_xml *args map { |x| x.to_xml(*args) }.join end alias :size :length alias :to_ary :to_a ### # Removes the last element from set and returns it, or +nil+ if # the set is empty def pop return nil if length == 0 delete last end ### # Returns the first element of the NodeSet and removes it. Returns # +nil+ if the set is empty. def shift return nil if length == 0 delete first end ### # Equality -- Two NodeSets are equal if the contain the same number # of elements and if each element is equal to the corresponding # element in the other NodeSet def == other return false unless other.is_a?(Nokogiri::XML::NodeSet) return false unless length == other.length each_with_index do |node, i| return false unless node == other[i] end true end ### # Returns a new NodeSet containing all the children of all the nodes in # the NodeSet def children inject(NodeSet.new(document)) { |set, node| set += node.children } end ### # Returns a new NodeSet containing all the nodes in the NodeSet # in reverse order def reverse node_set = NodeSet.new(document) (length - 1).downto(0) do |x| node_set.push self[x] end node_set end ### # Return a nicely formated string representation def inspect "[#{map { |c| c.inspect }.join ', '}]" end alias :+ :| end end end nokogiri-1.6.1/lib/nokogiri/xml/attr.rb0000644000175000017500000000036412261213762017411 0ustar boutilboutilmodule Nokogiri module XML class Attr < Node alias :value :content alias :to_s :content alias :content= :value= private def inspect_attributes [:name, :namespace, :value] end end end end nokogiri-1.6.1/lib/nokogiri/xml/dtd.rb0000644000175000017500000000072012261213762017206 0ustar boutilboutilmodule Nokogiri module XML class DTD < Nokogiri::XML::Node undef_method :attribute_nodes undef_method :values undef_method :content undef_method :namespace undef_method :namespace_definitions undef_method :line if method_defined?(:line) def keys attributes.keys end def each &block attributes.each { |key, value| block.call([key, value]) } end end end end nokogiri-1.6.1/lib/nokogiri/xml/sax/0000755000175000017500000000000012261213762016702 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/xml/sax/parser.rb0000644000175000017500000001011712261213762020523 0ustar boutilboutilmodule Nokogiri module XML module SAX ### # This parser is a SAX style parser that reads it's input as it # deems necessary. The parser takes a Nokogiri::XML::SAX::Document, # an optional encoding, then given an XML input, sends messages to # the Nokogiri::XML::SAX::Document. # # Here is an example of using this parser: # # # Create a subclass of Nokogiri::XML::SAX::Document and implement # # the events we care about: # class MyDoc < Nokogiri::XML::SAX::Document # def start_element name, attrs = [] # puts "starting: #{name}" # end # # def end_element name # puts "ending: #{name}" # end # end # # # Create our parser # parser = Nokogiri::XML::SAX::Parser.new(MyDoc.new) # # # Send some XML to the parser # parser.parse(File.open(ARGV[0])) # # For more information about SAX parsers, see Nokogiri::XML::SAX. Also # see Nokogiri::XML::SAX::Document for the available events. class Parser class Attribute < Struct.new(:localname, :prefix, :uri, :value) end # Encodinds this parser supports ENCODINGS = { 'NONE' => 0, # No char encoding detected 'UTF-8' => 1, # UTF-8 'UTF16LE' => 2, # UTF-16 little endian 'UTF16BE' => 3, # UTF-16 big endian 'UCS4LE' => 4, # UCS-4 little endian 'UCS4BE' => 5, # UCS-4 big endian 'EBCDIC' => 6, # EBCDIC uh! 'UCS4-2143' => 7, # UCS-4 unusual ordering 'UCS4-3412' => 8, # UCS-4 unusual ordering 'UCS2' => 9, # UCS-2 'ISO-8859-1' => 10, # ISO-8859-1 ISO Latin 1 'ISO-8859-2' => 11, # ISO-8859-2 ISO Latin 2 'ISO-8859-3' => 12, # ISO-8859-3 'ISO-8859-4' => 13, # ISO-8859-4 'ISO-8859-5' => 14, # ISO-8859-5 'ISO-8859-6' => 15, # ISO-8859-6 'ISO-8859-7' => 16, # ISO-8859-7 'ISO-8859-8' => 17, # ISO-8859-8 'ISO-8859-9' => 18, # ISO-8859-9 'ISO-2022-JP' => 19, # ISO-2022-JP 'SHIFT-JIS' => 20, # Shift_JIS 'EUC-JP' => 21, # EUC-JP 'ASCII' => 22, # pure ASCII } # The Nokogiri::XML::SAX::Document where events will be sent. attr_accessor :document # The encoding beings used for this document. attr_accessor :encoding # Create a new Parser with +doc+ and +encoding+ def initialize doc = Nokogiri::XML::SAX::Document.new, encoding = 'UTF-8' check_encoding(encoding) @encoding = encoding @document = doc @warned = false end ### # Parse given +thing+ which may be a string containing xml, or an # IO object. def parse thing, &block if thing.respond_to?(:read) && thing.respond_to?(:close) parse_io(thing, &block) else parse_memory(thing, &block) end end ### # Parse given +io+ def parse_io io, encoding = 'ASCII' check_encoding(encoding) @encoding = encoding ctx = ParserContext.io(io, ENCODINGS[encoding]) yield ctx if block_given? ctx.parse_with self end ### # Parse a file with +filename+ def parse_file filename raise ArgumentError unless filename raise Errno::ENOENT unless File.exists?(filename) raise Errno::EISDIR if File.directory?(filename) ctx = ParserContext.file filename yield ctx if block_given? ctx.parse_with self end def parse_memory data ctx = ParserContext.memory data yield ctx if block_given? ctx.parse_with self end private def check_encoding(encoding) encoding.upcase! raise ArgumentError.new("'#{encoding}' is not a valid encoding") unless ENCODINGS[encoding] end end end end end nokogiri-1.6.1/lib/nokogiri/xml/sax/document.rb0000644000175000017500000001341212261213762021046 0ustar boutilboutilmodule Nokogiri module XML ### # SAX Parsers are event driven parsers. Nokogiri provides two different # event based parsers when dealing with XML. If you want to do SAX style # parsing using HTML, check out Nokogiri::HTML::SAX. # # The basic way a SAX style parser works is by creating a parser, # telling the parser about the events we're interested in, then giving # the parser some XML to process. The parser will notify you when # it encounters events your said you would like to know about. # # To register for events, you simply subclass Nokogiri::XML::SAX::Document, # and implement the methods for which you would like notification. # # For example, if I want to be notified when a document ends, and when an # element starts, I would write a class like this: # # class MyDocument < Nokogiri::XML::SAX::Document # def end_document # puts "the document has ended" # end # # def start_element name, attributes = [] # puts "#{name} started" # end # end # # Then I would instantiate a SAX parser with this document, and feed the # parser some XML # # # Create a new parser # parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new) # # # Feed the parser some XML # parser.parse(File.open(ARGV[0])) # # Now my document handler will be called when each node starts, and when # then document ends. To see what kinds of events are available, take # a look at Nokogiri::XML::SAX::Document. # # Two SAX parsers for XML are available, a parser that reads from a string # or IO object as it feels necessary, and a parser that lets you spoon # feed it XML. If you want to let Nokogiri deal with reading your XML, # use the Nokogiri::XML::SAX::Parser. If you want to have fine grain # control over the XML input, use the Nokogiri::XML::SAX::PushParser. module SAX ### # This class is used for registering types of events you are interested # in handling. All of the methods on this class are available as # possible events while parsing an XML document. To register for any # particular event, just subclass this class and implement the methods # you are interested in knowing about. # # To only be notified about start and end element events, write a class # like this: # # class MyDocument < Nokogiri::XML::SAX::Document # def start_element name, attrs = [] # puts "#{name} started!" # end # # def end_element name # puts "#{name} ended" # end # end # # You can use this event handler for any SAX style parser included with # Nokogiri. See Nokogiri::XML::SAX, and Nokogiri::HTML::SAX. class Document ### # Called when an XML declaration is parsed def xmldecl version, encoding, standalone end ### # Called when document starts parsing def start_document end ### # Called when document ends parsing def end_document end ### # Called at the beginning of an element # * +name+ is the name of the tag # * +attrs+ are an assoc list of namespaces and attributes, e.g.: # [ ["xmlns:foo", "http://sample.net"], ["size", "large"] ] def start_element name, attrs = [] end ### # Called at the end of an element # +name+ is the tag name def end_element name end ### # Called at the beginning of an element # +name+ is the element name # +attrs+ is a list of attributes # +prefix+ is the namespace prefix for the element # +uri+ is the associated namespace URI # +ns+ is a hash of namespace prefix:urls associated with the element def start_element_namespace name, attrs = [], prefix = nil, uri = nil, ns = [] ### # Deal with SAX v1 interface name = [prefix, name].compact.join(':') attributes = ns.map { |ns_prefix,ns_uri| [['xmlns', ns_prefix].compact.join(':'), ns_uri] } + attrs.map { |attr| [[attr.prefix, attr.localname].compact.join(':'), attr.value] } start_element name, attributes end ### # Called at the end of an element # +name+ is the element's name # +prefix+ is the namespace prefix associated with the element # +uri+ is the associated namespace URI def end_element_namespace name, prefix = nil, uri = nil ### # Deal with SAX v1 interface end_element [prefix, name].compact.join(':') end ### # Characters read between a tag. This method might be called multiple # times given one contiguous string of characters. # # +string+ contains the character data def characters string end ### # Called when comments are encountered # +string+ contains the comment data def comment string end ### # Called on document warnings # +string+ contains the warning def warning string end ### # Called on document errors # +string+ contains the error def error string end ### # Called when cdata blocks are found # +string+ contains the cdata content def cdata_block string end ### # Called when processing instructions are found # +name+ is the target of the instruction # +content+ is the value of the instruction def processing_instruction name, content end end end end end nokogiri-1.6.1/lib/nokogiri/xml/sax/parser_context.rb0000644000175000017500000000072012261213762022266 0ustar boutilboutilmodule Nokogiri module XML module SAX ### # Context for XML SAX parsers. This class is usually not instantiated # by the user. Instead, you should be looking at # Nokogiri::XML::SAX::Parser class ParserContext def self.new thing, encoding = 'UTF-8' [:read, :close].all? { |x| thing.respond_to?(x) } ? io(thing, Parser::ENCODINGS[encoding]) : memory(thing) end end end end end nokogiri-1.6.1/lib/nokogiri/xml/sax/push_parser.rb0000644000175000017500000000351112261213762021562 0ustar boutilboutilmodule Nokogiri module XML module SAX ### # PushParser can parse a document that is fed to it manually. It # must be given a SAX::Document object which will be called with # SAX events as the document is being parsed. # # Calling PushParser#<< writes XML to the parser, calling any SAX # callbacks it can. # # PushParser#finish tells the parser that the document is finished # and calls the end_document SAX method. # # Example: # # parser = PushParser.new(Class.new(XML::SAX::Document) { # def start_document # puts "start document called" # end # }.new) # parser << "
      hello<" # parser << "/div>" # parser.finish class PushParser # The Nokogiri::XML::SAX::Document on which the PushParser will be # operating attr_accessor :document ### # Create a new PushParser with +doc+ as the SAX Document, providing # an optional +file_name+ and +encoding+ def initialize(doc = XML::SAX::Document.new, file_name = nil, encoding = 'UTF-8') @document = doc @encoding = encoding @sax_parser = XML::SAX::Parser.new(doc) ## Create our push parser context initialize_native(@sax_parser, file_name) end ### # Write a +chunk+ of XML to the PushParser. Any callback methods # that can be called will be called immediately. def write chunk, last_chunk = false native_write(chunk, last_chunk) end alias :<< :write ### # Finish the parsing. This method is only necessary for # Nokogiri::XML::SAX::Document#end_document to be called. def finish write '', true end end end end end nokogiri-1.6.1/lib/nokogiri/xml/node/0000755000175000017500000000000012261213762017034 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/xml/node/save_options.rb0000644000175000017500000000335112261213762022074 0ustar boutilboutilmodule Nokogiri module XML class Node ### # Save options for serializing nodes class SaveOptions # Format serialized xml FORMAT = 1 # Do not include declarations NO_DECLARATION = 2 # Do not include empty tags NO_EMPTY_TAGS = 4 # Do not save XHTML NO_XHTML = 8 # Save as XHTML AS_XHTML = 16 # Save as XML AS_XML = 32 # Save as HTML AS_HTML = 64 if Nokogiri.jruby? # Save builder created document AS_BUILDER = 128 # the default for XML documents DEFAULT_XML = AS_XML # https://github.com/sparklemotion/nokogiri/issues/#issue/415 # the default for HTML document DEFAULT_HTML = NO_DECLARATION | NO_EMPTY_TAGS | AS_HTML else # the default for XML documents DEFAULT_XML = FORMAT | AS_XML # the default for HTML document DEFAULT_HTML = FORMAT | NO_DECLARATION | NO_EMPTY_TAGS | AS_HTML end # the default for XHTML document DEFAULT_XHTML = FORMAT | NO_DECLARATION | NO_EMPTY_TAGS | AS_XHTML # Integer representation of the SaveOptions attr_reader :options # Create a new SaveOptions object with +options+ def initialize options = 0; @options = options; end constants.each do |constant| class_eval %{ def #{constant.downcase} @options |= #{constant} self end def #{constant.downcase}? #{constant} & @options == #{constant} end } end alias :to_i :options end end end end nokogiri-1.6.1/lib/nokogiri/xml/text.rb0000644000175000017500000000025412261213762017421 0ustar boutilboutilmodule Nokogiri module XML class Text < Nokogiri::XML::CharacterData def content=(string) self.native_content = string.to_s end end end end nokogiri-1.6.1/lib/nokogiri/xml/element_decl.rb0000644000175000017500000000050112261213762021050 0ustar boutilboutilmodule Nokogiri module XML class ElementDecl < Nokogiri::XML::Node undef_method :namespace undef_method :namespace_definitions undef_method :line if method_defined?(:line) def inspect "#<#{self.class.name}:#{sprintf("0x%x", object_id)} #{to_s.inspect}>" end end end end nokogiri-1.6.1/lib/nokogiri/xml/document.rb0000644000175000017500000002137412261213762020261 0ustar boutilboutilmodule Nokogiri module XML ## # Nokogiri::XML::Document is the main entry point for dealing with # XML documents. The Document is created by parsing an XML document. # See Nokogiri::XML::Document.parse() for more information on parsing. # # For searching a Document, see Nokogiri::XML::Node#css and # Nokogiri::XML::Node#xpath # class Document < Nokogiri::XML::Node # I'm ignoring unicode characters here. # See http://www.w3.org/TR/REC-xml-names/#ns-decl for more details. NCNAME_START_CHAR = "A-Za-z_" NCNAME_CHAR = NCNAME_START_CHAR + "\\-.0-9" NCNAME_RE = /^xmlns(:[#{NCNAME_START_CHAR}][#{NCNAME_CHAR}]*)?$/ ## # Parse an XML file. # # +string_or_io+ may be a String, or any object that responds to # _read_ and _close_ such as an IO, or StringIO. # # +url+ (optional) is the URI where this document is located. # # +encoding+ (optional) is the encoding that should be used when processing # the document. # # +options+ (optional) is a configuration object that sets options during # parsing, such as Nokogiri::XML::ParseOptions::RECOVER. See the # Nokogiri::XML::ParseOptions for more information. # # +block+ (optional) is passed a configuration object on which # parse options may be set. # # When parsing untrusted documents, it's recommended that the # +nonet+ option be used, as shown in this example code: # # Nokogiri::XML::Document.parse(xml_string) { |config| config.nonet } # # Nokogiri.XML() is a convenience method which will call this method. # def self.parse string_or_io, url = nil, encoding = nil, options = ParseOptions::DEFAULT_XML, &block options = Nokogiri::XML::ParseOptions.new(options) if Fixnum === options # Give the options to the user yield options if block_given? return new if empty_doc?(string_or_io) doc = if string_or_io.respond_to?(:read) url ||= string_or_io.respond_to?(:path) ? string_or_io.path : nil read_io(string_or_io, url, encoding, options.to_i) else # read_memory pukes on empty docs read_memory(string_or_io, url, encoding, options.to_i) end # do xinclude processing doc.do_xinclude(options) if options.xinclude? return doc end # A list of Nokogiri::XML::SyntaxError found when parsing a document attr_accessor :errors def initialize *args # :nodoc: @errors = [] @decorators = nil end ## # Create an element with +name+, and optionally setting the content and attributes. # # doc.create_element "div" #
      # doc.create_element "div", :class => "container" #
      # doc.create_element "div", "contents" #
      contents
      # doc.create_element "div", "contents", :class => "container" #
      contents
      # doc.create_element "div" { |node| node['class'] = "container" } #
      # def create_element name, *args, &block elm = Nokogiri::XML::Element.new(name, self, &block) args.each do |arg| case arg when Hash arg.each { |k,v| key = k.to_s if key =~ NCNAME_RE ns_name = key.split(":", 2)[1] elm.add_namespace_definition ns_name, v else elm[k.to_s] = v.to_s end } else elm.content = arg end end if ns = elm.namespace_definitions.find { |n| n.prefix.nil? or n.prefix == '' } elm.namespace = ns end elm end # Create a Text Node with +string+ def create_text_node string, &block Nokogiri::XML::Text.new string.to_s, self, &block end # Create a CDATA Node containing +string+ def create_cdata string, &block Nokogiri::XML::CDATA.new self, string.to_s, &block end # Create a Comment Node containing +string+ def create_comment string, &block Nokogiri::XML::Comment.new self, string.to_s, &block end # The name of this document. Always returns "document" def name 'document' end # A reference to +self+ def document self end ## # Recursively get all namespaces from this node and its subtree and # return them as a hash. # # For example, given this document: # # # # # # This method will return: # # { 'xmlns:foo' => 'bar', 'xmlns:hello' => 'world' } # # WARNING: this method will clobber duplicate names in the keys. # For example, given this document: # # # # # # The hash returned will look like this: { 'xmlns:foo' => 'bar' } # # Non-prefixed default namespaces (as in "xmlns=") are not included # in the hash. # # Note that this method does an xpath lookup for nodes with # namespaces, and as a result the order may be dependent on the # implementation of the underlying XML library. # def collect_namespaces xpath("//namespace::*").inject({}) do |hash, ns| hash[["xmlns",ns.prefix].compact.join(":")] = ns.href if ns.prefix != "xml" hash end end # Get the list of decorators given +key+ def decorators key @decorators ||= Hash.new @decorators[key] ||= [] end ## # Validate this Document against it's DTD. Returns a list of errors on # the document or +nil+ when there is no DTD. def validate return nil unless internal_subset internal_subset.validate self end ## # Explore a document with shortcut methods. See Nokogiri::Slop for details. # # Note that any nodes that have been instantiated before #slop! # is called will not be decorated with sloppy behavior. So, if you're in # irb, the preferred idiom is: # # irb> doc = Nokogiri::Slop my_markup # # and not # # irb> doc = Nokogiri::HTML my_markup # ... followed by irb's implicit inspect (and therefore instantiation of every node) ... # irb> doc.slop! # ... which does absolutely nothing. # def slop! unless decorators(XML::Node).include? Nokogiri::Decorators::Slop decorators(XML::Node) << Nokogiri::Decorators::Slop decorate! end self end ## # Apply any decorators to +node+ def decorate node return unless @decorators @decorators.each { |klass,list| next unless node.is_a?(klass) list.each { |moodule| node.extend(moodule) } } end alias :to_xml :serialize alias :clone :dup # Get the hash of namespaces on the root Nokogiri::XML::Node def namespaces root ? root.namespaces : {} end ## # Create a Nokogiri::XML::DocumentFragment from +tags+ # Returns an empty fragment if +tags+ is nil. def fragment tags = nil DocumentFragment.new(self, tags, self.root) end undef_method :swap, :parent, :namespace, :default_namespace= undef_method :add_namespace_definition, :attributes undef_method :namespace_definitions, :line, :add_namespace def add_child node_or_tags raise "Document already has a root node" if root node_or_tags = coerce(node_or_tags) if node_or_tags.is_a?(XML::NodeSet) raise "Document cannot have multiple root nodes" if node_or_tags.size > 1 super(node_or_tags.first) else super end end alias :<< :add_child ## # +JRuby+ # Wraps Java's org.w3c.dom.document and returns Nokogiri::XML::Document def self.wrap document raise "JRuby only method" unless Nokogiri.jruby? return wrapJavaDocument(document) end ## # +JRuby+ # Returns Java's org.w3c.dom.document of this Document. def to_java raise "JRuby only method" unless Nokogiri.jruby? return toJavaDocument() end private def self.empty_doc? string_or_io string_or_io.nil? || (string_or_io.respond_to?(:empty?) && string_or_io.empty?) || (string_or_io.respond_to?(:eof?) && string_or_io.eof?) end def implied_xpath_context "/" end def inspect_attributes [:name, :children] end end end end nokogiri-1.6.1/lib/nokogiri/xml/syntax_error.rb0000644000175000017500000000164412261213762021200 0ustar boutilboutilmodule Nokogiri module XML ### # This class provides information about XML SyntaxErrors. These # exceptions are typically stored on Nokogiri::XML::Document#errors. class SyntaxError < ::Nokogiri::SyntaxError attr_reader :domain attr_reader :code attr_reader :level attr_reader :file attr_reader :line attr_reader :str1 attr_reader :str2 attr_reader :str3 attr_reader :int1 attr_reader :column ### # return true if this is a non error def none? level == 0 end ### # return true if this is a warning def warning? level == 1 end ### # return true if this is an error def error? level == 2 end ### # return true if this error is fatal def fatal? level == 3 end def to_s super.chomp end end end end nokogiri-1.6.1/lib/nokogiri/xml/parse_options.rb0000644000175000017500000000530012261213762021317 0ustar boutilboutilmodule Nokogiri module XML ### # Parse options for passing to Nokogiri.XML or Nokogiri.HTML class ParseOptions # Strict parsing STRICT = 0 # Recover from errors RECOVER = 1 << 0 # Substitute entities NOENT = 1 << 1 # Load external subsets DTDLOAD = 1 << 2 # Default DTD attributes DTDATTR = 1 << 3 # validate with the DTD DTDVALID = 1 << 4 # suppress error reports NOERROR = 1 << 5 # suppress warning reports NOWARNING = 1 << 6 # pedantic error reporting PEDANTIC = 1 << 7 # remove blank nodes NOBLANKS = 1 << 8 # use the SAX1 interface internally SAX1 = 1 << 9 # Implement XInclude substitution XINCLUDE = 1 << 10 # Forbid network access. Recommended for dealing with untrusted documents. NONET = 1 << 11 # Do not reuse the context dictionary NODICT = 1 << 12 # remove redundant namespaces declarations NSCLEAN = 1 << 13 # merge CDATA as text nodes NOCDATA = 1 << 14 # do not generate XINCLUDE START/END nodes NOXINCNODE = 1 << 15 # compact small text nodes; no modification of the tree allowed afterwards (will possibly crash if you try to modify the tree) COMPACT = 1 << 16 # parse using XML-1.0 before update 5 OLD10 = 1 << 17 # do not fixup XINCLUDE xml:base uris NOBASEFIX = 1 << 18 # relax any hardcoded limit from the parser HUGE = 1 << 19 # the default options used for parsing XML documents DEFAULT_XML = RECOVER | NONET # the default options used for parsing HTML documents DEFAULT_HTML = RECOVER | NOERROR | NOWARNING | NONET attr_accessor :options def initialize options = STRICT @options = options end constants.each do |constant| next if constant.to_sym == :STRICT class_eval %{ def #{constant.downcase} @options |= #{constant} self end def no#{constant.downcase} @options &= ~#{constant} self end def #{constant.downcase}? #{constant} & @options == #{constant} end } end def strict @options &= ~RECOVER self end def strict? @options & RECOVER == STRICT end alias :to_i :options def inspect options = [] self.class.constants.each do |k| options << k.downcase if send(:"#{k.downcase}?") end super.sub(/>$/, " " + options.join(', ') + ">") end end end end nokogiri-1.6.1/lib/nokogiri/xml/schema.rb0000644000175000017500000000337012261213762017677 0ustar boutilboutilmodule Nokogiri module XML class << self ### # Create a new Nokogiri::XML::Schema object using a +string_or_io+ # object. def Schema string_or_io Schema.new(string_or_io) end end ### # Nokogiri::XML::Schema is used for validating XML against a schema # (usually from an xsd file). # # == Synopsis # # Validate an XML document against a Schema. Loop over the errors that # are returned and print them out: # # xsd = Nokogiri::XML::Schema(File.read(PO_SCHEMA_FILE)) # doc = Nokogiri::XML(File.read(PO_XML_FILE)) # # xsd.validate(doc).each do |error| # puts error.message # end # # The list of errors are Nokogiri::XML::SyntaxError objects. class Schema # Errors while parsing the schema file attr_accessor :errors ### # Create a new Nokogiri::XML::Schema object using a +string_or_io+ # object. def self.new string_or_io from_document Nokogiri::XML(string_or_io) end ### # Validate +thing+ against this schema. +thing+ can be a # Nokogiri::XML::Document object, or a filename. An Array of # Nokogiri::XML::SyntaxError objects found while validating the # +thing+ is returned. def validate thing if thing.is_a?(Nokogiri::XML::Document) validate_document(thing) elsif File.file?(thing) validate_file(thing) else raise ArgumentError, "Must provide Nokogiri::Xml::Document or the name of an existing file" end end ### # Returns true if +thing+ is a valid Nokogiri::XML::Document or # file. def valid? thing validate(thing).length == 0 end end end end nokogiri-1.6.1/lib/nokogiri/xml/processing_instruction.rb0000644000175000017500000000021512261213762023247 0ustar boutilboutilmodule Nokogiri module XML class ProcessingInstruction < Node def initialize document, name, content end end end end nokogiri-1.6.1/lib/nokogiri/xml/attribute_decl.rb0000644000175000017500000000073512261213762021433 0ustar boutilboutilmodule Nokogiri module XML ### # Represents an attribute declaration in a DTD class AttributeDecl < Nokogiri::XML::Node undef_method :attribute_nodes undef_method :attributes undef_method :content undef_method :namespace undef_method :namespace_definitions undef_method :line if method_defined?(:line) def inspect "#<#{self.class.name}:#{sprintf("0x%x", object_id)} #{to_s.inspect}>" end end end end nokogiri-1.6.1/lib/nokogiri/xml/pp.rb0000644000175000017500000000011012261213762017043 0ustar boutilboutilrequire 'nokogiri/xml/pp/node' require 'nokogiri/xml/pp/character_data' nokogiri-1.6.1/lib/nokogiri/xml/reader.rb0000644000175000017500000000577612261213762017715 0ustar boutilboutilmodule Nokogiri module XML ### # Nokogiri::XML::Reader parses an XML document similar to the way a cursor # would move. The Reader is given an XML document, and yields nodes # to an each block. # # Here is an example of usage: # # reader = Nokogiri::XML::Reader(<<-eoxml) # # snuggles! # # eoxml # # reader.each do |node| # # # node is an instance of Nokogiri::XML::Reader # puts node.name # # end # # Note that Nokogiri::XML::Reader#each can only be called once!! Once # the cursor moves through the entire document, you must parse the # document again. So make sure that you capture any information you # need during the first iteration. # # The Reader parser is good for when you need the speed of a SAX parser, # but do not want to write a Document handler. class Reader include Enumerable TYPE_NONE = 0 # Element node type TYPE_ELEMENT = 1 # Attribute node type TYPE_ATTRIBUTE = 2 # Text node type TYPE_TEXT = 3 # CDATA node type TYPE_CDATA = 4 # Entity Reference node type TYPE_ENTITY_REFERENCE = 5 # Entity node type TYPE_ENTITY = 6 # PI node type TYPE_PROCESSING_INSTRUCTION = 7 # Comment node type TYPE_COMMENT = 8 # Document node type TYPE_DOCUMENT = 9 # Document Type node type TYPE_DOCUMENT_TYPE = 10 # Document Fragment node type TYPE_DOCUMENT_FRAGMENT = 11 # Notation node type TYPE_NOTATION = 12 # Whitespace node type TYPE_WHITESPACE = 13 # Significant Whitespace node type TYPE_SIGNIFICANT_WHITESPACE = 14 # Element end node type TYPE_END_ELEMENT = 15 # Entity end node type TYPE_END_ENTITY = 16 # XML Declaration node type TYPE_XML_DECLARATION = 17 # A list of errors encountered while parsing attr_accessor :errors # The encoding for the document attr_reader :encoding # The XML source attr_reader :source alias :self_closing? :empty_element? def initialize source, url = nil, encoding = nil # :nodoc: @source = source @errors = [] @encoding = encoding end private :initialize ### # Get a list of attributes for the current node. def attributes Hash[attribute_nodes.map { |node| [node.name, node.to_s] }].merge(namespaces || {}) end ### # Get a list of attributes for the current node def attribute_nodes nodes = attr_nodes nodes.each { |v| v.instance_variable_set(:@_r, self) } nodes end ### # Move the cursor through the document yielding the cursor to the block def each while cursor = self.read yield cursor end end end end end nokogiri-1.6.1/lib/nokogiri/xml/document_fragment.rb0000644000175000017500000000610212261213762022134 0ustar boutilboutilmodule Nokogiri module XML class DocumentFragment < Nokogiri::XML::Node ## # Create a new DocumentFragment from +tags+. # # If +ctx+ is present, it is used as a context node for the # subtree created, e.g., namespaces will be resolved relative # to +ctx+. def initialize document, tags = nil, ctx = nil return self unless tags children = if ctx # Fix for issue#490 if Nokogiri.jruby? # fix for issue #770 ctx.parse("#{tags}").children else ctx.parse(tags) end else XML::Document.parse("#{tags}") \ .xpath("/root/node()") end children.each { |child| child.parent = self } end ### # return the name for DocumentFragment def name '#document-fragment' end ### # Convert this DocumentFragment to a string def to_s children.to_s end ### # Convert this DocumentFragment to html # See Nokogiri::XML::NodeSet#to_html def to_html *args if Nokogiri.jruby? options = args.first.is_a?(Hash) ? args.shift : {} if !options[:save_with] options[:save_with] = Node::SaveOptions::NO_DECLARATION | Node::SaveOptions::NO_EMPTY_TAGS | Node::SaveOptions::AS_HTML end args.insert(0, options) end children.to_html(*args) end ### # Convert this DocumentFragment to xhtml # See Nokogiri::XML::NodeSet#to_xhtml def to_xhtml *args if Nokogiri.jruby? options = args.first.is_a?(Hash) ? args.shift : {} if !options[:save_with] options[:save_with] = Node::SaveOptions::NO_DECLARATION | Node::SaveOptions::NO_EMPTY_TAGS | Node::SaveOptions::AS_XHTML end args.insert(0, options) end children.to_xhtml(*args) end ### # Convert this DocumentFragment to xml # See Nokogiri::XML::NodeSet#to_xml def to_xml *args children.to_xml(*args) end ### # Search this fragment. See Nokogiri::XML::Node#css def css *args if children.any? children.css(*args) else NodeSet.new(document) end end alias :serialize :to_s class << self #### # Create a Nokogiri::XML::DocumentFragment from +tags+ def parse tags self.new(XML::Document.new, tags) end end private # fix for issue 770 def namespace_declarations ctx ctx.namespace_scopes.map do |namespace| prefix = namespace.prefix.nil? ? "" : ":#{namespace.prefix}" %Q{xmlns#{prefix}="#{namespace.href}"} end.join ' ' end def coerce data return super unless String === data document.fragment(data).children end end end end nokogiri-1.6.1/lib/nokogiri/xml/namespace.rb0000644000175000017500000000032412261213762020367 0ustar boutilboutilmodule Nokogiri module XML class Namespace include Nokogiri::XML::PP::Node attr_reader :document private def inspect_attributes [:prefix, :href] end end end end nokogiri-1.6.1/lib/nokogiri/xml/character_data.rb0000644000175000017500000000021412261213762021356 0ustar boutilboutilmodule Nokogiri module XML class CharacterData < Nokogiri::XML::Node include Nokogiri::XML::PP::CharacterData end end end nokogiri-1.6.1/lib/nokogiri/xml/entity_decl.rb0000644000175000017500000000073112261213762020740 0ustar boutilboutilmodule Nokogiri module XML class EntityDecl < Nokogiri::XML::Node undef_method :attribute_nodes undef_method :attributes undef_method :namespace undef_method :namespace_definitions undef_method :line if method_defined?(:line) def self.new name, doc, *args doc.create_entity(name, *args) end def inspect "#<#{self.class.name}:#{sprintf("0x%x", object_id)} #{to_s.inspect}>" end end end end nokogiri-1.6.1/lib/nokogiri/xml/pp/0000755000175000017500000000000012261213762016526 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/xml/pp/node.rb0000644000175000017500000000315612261213762020005 0ustar boutilboutilmodule Nokogiri module XML module PP module Node def inspect # :nodoc: attributes = inspect_attributes.reject { |x| begin attribute = send x !attribute || (attribute.respond_to?(:empty?) && attribute.empty?) rescue NoMethodError true end }.map { |attribute| "#{attribute.to_s.sub(/_\w+/, 's')}=#{send(attribute).inspect}" }.join ' ' "#<#{self.class.name}:#{sprintf("0x%x", object_id)} #{attributes}>" end def pretty_print pp # :nodoc: nice_name = self.class.name.split('::').last pp.group(2, "#(#{nice_name}:#{sprintf("0x%x", object_id)} {", '})') do pp.breakable attrs = inspect_attributes.map { |t| [t, send(t)] if respond_to?(t) }.compact.find_all { |x| if x.last if [:attribute_nodes, :children].include? x.first !x.last.empty? else true end end } pp.seplist(attrs) do |v| if [:attribute_nodes, :children].include? v.first pp.group(2, "#{v.first.to_s.sub(/_\w+$/, 's')} = [", "]") do pp.breakable pp.seplist(v.last) do |item| pp.pp item end end else pp.text "#{v.first} = " pp.pp v.last end end pp.breakable end end end end end end nokogiri-1.6.1/lib/nokogiri/xml/pp/character_data.rb0000644000175000017500000000063312261213762022002 0ustar boutilboutilmodule Nokogiri module XML module PP module CharacterData def pretty_print pp # :nodoc: nice_name = self.class.name.split('::').last pp.group(2, "#(#{nice_name} ", ')') do pp.pp text end end def inspect # :nodoc: "#<#{self.class.name}:#{sprintf("0x%x",object_id)} #{text.inspect}>" end end end end end nokogiri-1.6.1/lib/nokogiri/xml/notation.rb0000644000175000017500000000015612261213762020271 0ustar boutilboutilmodule Nokogiri module XML class Notation < Struct.new(:name, :public_id, :system_id) end end end nokogiri-1.6.1/lib/nokogiri/xml/relax_ng.rb0000644000175000017500000000155612261213762020242 0ustar boutilboutilmodule Nokogiri module XML class << self ### # Create a new Nokogiri::XML::RelaxNG document from +string_or_io+. # See Nokogiri::XML::RelaxNG for an example. def RelaxNG string_or_io RelaxNG.new(string_or_io) end end ### # Nokogiri::XML::RelaxNG is used for validating XML against a # RelaxNG schema. # # == Synopsis # # Validate an XML document against a RelaxNG schema. Loop over the errors # that are returned and print them out: # # schema = Nokogiri::XML::RelaxNG(File.open(ADDRESS_SCHEMA_FILE)) # doc = Nokogiri::XML(File.open(ADDRESS_XML_FILE)) # # schema.validate(doc).each do |error| # puts error.message # end # # The list of errors are Nokogiri::XML::SyntaxError objects. class RelaxNG < Nokogiri::XML::Schema end end end nokogiri-1.6.1/lib/nokogiri/decorators/0000755000175000017500000000000012261213762017454 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/decorators/slop.rb0000644000175000017500000000175712261213762020770 0ustar boutilboutilmodule Nokogiri module Decorators ### # The Slop decorator implements method missing such that a methods may be # used instead of XPath or CSS. See Nokogiri.Slop module Slop ### # look for node with +name+. See Nokogiri.Slop def method_missing name, *args, &block prefix = implied_xpath_context if args.empty? list = xpath("#{prefix}#{name.to_s.sub(/^_/, '')}") elsif args.first.is_a? Hash hash = args.first if hash[:css] list = css("#{name}#{hash[:css]}") elsif hash[:xpath] conds = Array(hash[:xpath]).join(' and ') list = xpath("#{prefix}#{name}[#{conds}]") end else CSS::Parser.without_cache do list = xpath( *CSS.xpath_for("#{name}#{args.first}", :prefix => prefix) ) end end super if list.empty? list.length == 1 ? list.first : list end end end end nokogiri-1.6.1/lib/nokogiri/xml.rb0000644000175000017500000000454412261213762016443 0ustar boutilboutilrequire 'nokogiri/xml/pp' require 'nokogiri/xml/parse_options' require 'nokogiri/xml/sax' require 'nokogiri/xml/node' require 'nokogiri/xml/attribute_decl' require 'nokogiri/xml/element_decl' require 'nokogiri/xml/element_content' require 'nokogiri/xml/character_data' require 'nokogiri/xml/namespace' require 'nokogiri/xml/attr' require 'nokogiri/xml/dtd' require 'nokogiri/xml/cdata' require 'nokogiri/xml/text' require 'nokogiri/xml/document' require 'nokogiri/xml/document_fragment' require 'nokogiri/xml/processing_instruction' require 'nokogiri/xml/node_set' require 'nokogiri/xml/syntax_error' require 'nokogiri/xml/xpath' require 'nokogiri/xml/xpath_context' require 'nokogiri/xml/builder' require 'nokogiri/xml/reader' require 'nokogiri/xml/notation' require 'nokogiri/xml/entity_decl' require 'nokogiri/xml/schema' require 'nokogiri/xml/relax_ng' module Nokogiri class << self ### # Parse XML. Convenience method for Nokogiri::XML::Document.parse def XML thing, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_XML, &block Nokogiri::XML::Document.parse(thing, url, encoding, options, &block) end end module XML # Original C14N 1.0 spec canonicalization XML_C14N_1_0 = 0 # Exclusive C14N 1.0 spec canonicalization XML_C14N_EXCLUSIVE_1_0 = 1 # C14N 1.1 spec canonicalization XML_C14N_1_1 = 2 class << self ### # Parse an XML document using the Nokogiri::XML::Reader API. See # Nokogiri::XML::Reader for mor information def Reader string_or_io, url = nil, encoding = nil, options = ParseOptions::STRICT options = Nokogiri::XML::ParseOptions.new(options) if Fixnum === options # Give the options to the user yield options if block_given? if string_or_io.respond_to? :read return Reader.from_io(string_or_io, url, encoding, options.to_i) end Reader.from_memory(string_or_io, url, encoding, options.to_i) end ### # Parse XML. Convenience method for Nokogiri::XML::Document.parse def parse thing, url = nil, encoding = nil, options = ParseOptions::DEFAULT_XML, &block Document.parse(thing, url, encoding, options, &block) end #### # Parse a fragment from +string+ in to a NodeSet. def fragment string XML::DocumentFragment.parse(string) end end end end nokogiri-1.6.1/lib/nokogiri/css/0000755000175000017500000000000012261213762016077 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/css/xpath_visitor.rb0000644000175000017500000001273312261213762021335 0ustar boutilboutilmodule Nokogiri module CSS class XPathVisitor # :nodoc: def visit_function node # note that nth-child and nth-last-child are preprocessed in css/node.rb. msg = :"visit_function_#{node.value.first.gsub(/[(]/, '')}" return self.send(msg, node) if self.respond_to?(msg) case node.value.first when /^text\(/ 'child::text()' when /^self\(/ "self::#{node.value[1]}" when /^eq\(/ "position() = #{node.value[1]}" when /^(nth|nth-of-type|nth-child)\(/ if node.value[1].is_a?(Nokogiri::CSS::Node) and node.value[1].type == :AN_PLUS_B an_plus_b(node.value[1]) else "position() = #{node.value[1]}" end when /^(nth-last-child|nth-last-of-type)\(/ if node.value[1].is_a?(Nokogiri::CSS::Node) and node.value[1].type == :AN_PLUS_B an_plus_b(node.value[1], :last => true) else index = node.value[1].to_i - 1 index == 0 ? "position() = last()" : "position() = last() - #{index}" end when /^(first|first-of-type)\(/ "position() = 1" when /^(last|last-of-type)\(/ "position() = last()" when /^contains\(/ "contains(., #{node.value[1]})" when /^gt\(/ "position() > #{node.value[1]}" when /^only-child\(/ "last() = 1" when /^comment\(/ "comment()" when /^has\(/ node.value[1].accept(self) else args = ['.'] + node.value[1..-1] "#{node.value.first}#{args.join(', ')})" end end def visit_not node child = node.value.first if :ELEMENT_NAME == child.type "not(self::#{child.accept(self)})" else "not(#{child.accept(self)})" end end def visit_id node node.value.first =~ /^#(.*)$/ "@id = '#{$1}'" end def visit_attribute_condition node attribute = if (node.value.first.type == :FUNCTION) or (node.value.first.value.first =~ /::/) '' else '@' end attribute += node.value.first.accept(self) # Support non-standard css attribute.gsub!(/^@@/, '@') return attribute unless node.value.length == 3 value = node.value.last value = "'#{value}'" if value !~ /^['"]/ case node.value[1] when :equal attribute + " = " + "#{value}" when :not_equal attribute + " != " + "#{value}" when :substring_match "contains(#{attribute}, #{value})" when :prefix_match "starts-with(#{attribute}, #{value})" when :dash_match "#{attribute} = #{value} or starts-with(#{attribute}, concat(#{value}, '-'))" when :includes "contains(concat(\" \", #{attribute}, \" \"),concat(\" \", #{value}, \" \"))" when :suffix_match "substring(#{attribute}, string-length(#{attribute}) - " + "string-length(#{value}) + 1, string-length(#{value})) = #{value}" else attribute + " #{node.value[1]} " + "#{value}" end end def visit_pseudo_class node if node.value.first.is_a?(Nokogiri::CSS::Node) and node.value.first.type == :FUNCTION node.value.first.accept(self) else msg = :"visit_pseudo_class_#{node.value.first.gsub(/[(]/, '')}" return self.send(msg, node) if self.respond_to?(msg) case node.value.first when "first", "first-child" then "position() = 1" when "last", "last-child" then "position() = last()" when "first-of-type" then "position() = 1" when "last-of-type" then "position() = last()" when "only-of-type" then "last() = 1" when "empty" then "not(node())" when "parent" then "node()" when "root" then "not(parent::*)" else node.value.first + "(.)" end end end def visit_class_condition node "contains(concat(' ', normalize-space(@class), ' '), ' #{node.value.first} ')" end { 'combinator' => ' and ', 'direct_adjacent_selector' => "/following-sibling::*[1]/self::", 'following_selector' => "/following-sibling::", 'descendant_selector' => '//', 'child_selector' => '/', }.each do |k,v| class_eval %{ def visit_#{k} node "\#{node.value.first.accept(self) if node.value.first}#{v}\#{node.value.last.accept(self)}" end } end def visit_conditional_selector node node.value.first.accept(self) + '[' + node.value.last.accept(self) + ']' end def visit_element_name node node.value.first end def accept node node.accept(self) end private def an_plus_b node, options={} raise ArgumentError, "expected an+b node to contain 4 tokens, but is #{node.value.inspect}" unless node.value.size == 4 a = node.value[0].to_i b = node.value[3].to_i position = options[:last] ? "(last()-position()+1)" : "position()" if (b == 0) return "(#{position} mod #{a}) = 0" else compare = (a < 0) ? "<=" : ">=" return "(#{position} #{compare} #{b}) and (((#{position}-#{b}) mod #{a.abs}) = 0)" end end end end end nokogiri-1.6.1/lib/nokogiri/css/parser.y0000644000175000017500000001611712261213762017573 0ustar boutilboutilclass Nokogiri::CSS::Parser token FUNCTION INCLUDES DASHMATCH LBRACE HASH PLUS GREATER S STRING IDENT token COMMA NUMBER PREFIXMATCH SUFFIXMATCH SUBSTRINGMATCH TILDE NOT_EQUAL token SLASH DOUBLESLASH NOT EQUAL RPAREN LSQUARE RSQUARE HAS rule selector : selector COMMA simple_selector_1toN { result = [val.first, val.last].flatten } | prefixless_combinator_selector { result = val.flatten } | simple_selector_1toN { result = val.flatten } ; combinator : PLUS { result = :DIRECT_ADJACENT_SELECTOR } | GREATER { result = :CHILD_SELECTOR } | TILDE { result = :FOLLOWING_SELECTOR } | S { result = :DESCENDANT_SELECTOR } | DOUBLESLASH { result = :DESCENDANT_SELECTOR } | SLASH { result = :CHILD_SELECTOR } ; simple_selector : element_name hcap_0toN { result = if val[1].nil? val.first else Node.new(:CONDITIONAL_SELECTOR, [val.first, val[1]]) end } | element_name hcap_1toN negation { result = Node.new(:CONDITIONAL_SELECTOR, [ val.first, Node.new(:COMBINATOR, [val[1], val.last]) ] ) } | element_name negation { result = Node.new(:CONDITIONAL_SELECTOR, val) } | function | function pseudo { result = Node.new(:CONDITIONAL_SELECTOR, val) } | function attrib { result = Node.new(:CONDITIONAL_SELECTOR, val) } | hcap_1toN negation { result = Node.new(:CONDITIONAL_SELECTOR, [ Node.new(:ELEMENT_NAME, ['*']), Node.new(:COMBINATOR, val) ] ) } | hcap_1toN { result = Node.new(:CONDITIONAL_SELECTOR, [Node.new(:ELEMENT_NAME, ['*']), val.first] ) } ; prefixless_combinator_selector : combinator simple_selector_1toN { result = Node.new(val.first, [nil, val.last]) } ; simple_selector_1toN : simple_selector combinator simple_selector_1toN { result = Node.new(val[1], [val.first, val.last]) } | simple_selector ; class : '.' IDENT { result = Node.new(:CLASS_CONDITION, [val[1]]) } ; element_name : namespaced_ident | '*' { result = Node.new(:ELEMENT_NAME, val) } ; namespaced_ident : namespace '|' IDENT { result = Node.new(:ELEMENT_NAME, [[val.first, val.last].compact.join(':')] ) } | IDENT { name = @namespaces.key?('xmlns') ? "xmlns:#{val.first}" : val.first result = Node.new(:ELEMENT_NAME, [name]) } ; namespace : IDENT { result = val[0] } | ; attrib : LSQUARE attrib_name attrib_val_0or1 RSQUARE { result = Node.new(:ATTRIBUTE_CONDITION, [val[1]] + (val[2] || []) ) } | LSQUARE function attrib_val_0or1 RSQUARE { result = Node.new(:ATTRIBUTE_CONDITION, [val[1]] + (val[2] || []) ) } | LSQUARE NUMBER RSQUARE { # Non standard, but hpricot supports it. result = Node.new(:PSEUDO_CLASS, [Node.new(:FUNCTION, ['nth-child(', val[1]])] ) } ; attrib_name : namespace '|' IDENT { result = Node.new(:ELEMENT_NAME, [[val.first, val.last].compact.join(':')] ) } | IDENT { # Default namespace is not applied to attributes. # So we don't add prefix "xmlns:" as in namespaced_ident. result = Node.new(:ELEMENT_NAME, [val.first]) } ; function : FUNCTION RPAREN { result = Node.new(:FUNCTION, [val.first.strip]) } | FUNCTION expr RPAREN { result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) } | FUNCTION an_plus_b RPAREN { result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) } | NOT expr RPAREN { result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) } | HAS selector RPAREN { result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) } ; expr : NUMBER COMMA expr { result = [val.first, val.last] } | STRING COMMA expr { result = [val.first, val.last] } | IDENT COMMA expr { result = [val.first, val.last] } | NUMBER | STRING | IDENT # even, odd { if val[0] == 'even' val = ["2","n","+","0"] result = Node.new(:AN_PLUS_B, val) elsif val[0] == 'odd' val = ["2","n","+","1"] result = Node.new(:AN_PLUS_B, val) else # This is not CSS standard. It allows us to support this: # assert_xpath("//a[foo(., @href)]", @parser.parse('a:foo(@href)')) # assert_xpath("//a[foo(., @a, b)]", @parser.parse('a:foo(@a, b)')) # assert_xpath("//a[foo(., a, 10)]", @parser.parse('a:foo(a, 10)')) result = val end } ; an_plus_b : NUMBER IDENT PLUS NUMBER # 5n+3 -5n+3 { if val[1] == 'n' result = Node.new(:AN_PLUS_B, val) else raise Racc::ParseError, "parse error on IDENT '#{val[1]}'" end } | IDENT PLUS NUMBER { # n+3, -n+3 if val[0] == 'n' val.unshift("1") result = Node.new(:AN_PLUS_B, val) elsif val[0] == '-n' val[0] = 'n' val.unshift("-1") result = Node.new(:AN_PLUS_B, val) else raise Racc::ParseError, "parse error on IDENT '#{val[1]}'" end } | NUMBER IDENT # 5n, -5n { if val[1] == 'n' val << "+" val << "0" result = Node.new(:AN_PLUS_B, val) else raise Racc::ParseError, "parse error on IDENT '#{val[1]}'" end } ; pseudo : ':' function { result = Node.new(:PSEUDO_CLASS, [val[1]]) } | ':' IDENT { result = Node.new(:PSEUDO_CLASS, [val[1]]) } ; hcap_0toN : hcap_1toN | ; hcap_1toN : attribute_id hcap_1toN { result = Node.new(:COMBINATOR, val) } | class hcap_1toN { result = Node.new(:COMBINATOR, val) } | attrib hcap_1toN { result = Node.new(:COMBINATOR, val) } | pseudo hcap_1toN { result = Node.new(:COMBINATOR, val) } | attribute_id | class | attrib | pseudo ; attribute_id : HASH { result = Node.new(:ID, val) } ; attrib_val_0or1 : eql_incl_dash IDENT { result = [val.first, val[1]] } | eql_incl_dash STRING { result = [val.first, val[1]] } | ; eql_incl_dash : EQUAL { result = :equal } | PREFIXMATCH { result = :prefix_match } | SUFFIXMATCH { result = :suffix_match } | SUBSTRINGMATCH { result = :substring_match } | NOT_EQUAL { result = :not_equal } | INCLUDES { result = :includes } | DASHMATCH { result = :dash_match } ; negation : NOT negation_arg RPAREN { result = Node.new(:NOT, [val[1]]) } ; negation_arg : element_name | element_name hcap_1toN | hcap_1toN ; end ---- header require 'nokogiri/css/parser_extras' nokogiri-1.6.1/lib/nokogiri/css/parser_extras.rb0000644000175000017500000000456612261213762021321 0ustar boutilboutilrequire 'thread' module Nokogiri module CSS class Parser < Racc::Parser @cache_on = true @cache = {} @mutex = Mutex.new class << self # Turn on CSS parse caching attr_accessor :cache_on alias :cache_on? :cache_on alias :set_cache :cache_on= # Get the css selector in +string+ from the cache def [] string return unless @cache_on @mutex.synchronize { @cache[string] } end # Set the css selector in +string+ in the cache to +value+ def []= string, value return value unless @cache_on @mutex.synchronize { @cache[string] = value } end # Clear the cache def clear_cache @mutex.synchronize { @cache = {} } end # Execute +block+ without cache def without_cache &block tmp = @cache_on @cache_on = false block.call @cache_on = tmp end ### # Parse this CSS selector in +selector+. Returns an AST. def parse selector @warned ||= false unless @warned $stderr.puts('Nokogiri::CSS::Parser.parse is deprecated, call Nokogiri::CSS.parse(), this will be removed August 1st or version 1.4.0 (whichever is first)') @warned = true end new.parse selector end end # Create a new CSS parser with respect to +namespaces+ def initialize namespaces = {} @tokenizer = Tokenizer.new @namespaces = namespaces super() end def parse string @tokenizer.scan_setup string do_parse end def next_token @tokenizer.next_token end # Get the xpath for +string+ using +options+ def xpath_for string, options={} key = "#{string}#{options[:ns]}#{options[:prefix]}" v = self.class[key] return v if v args = [ options[:prefix] || '//', options[:visitor] || XPathVisitor.new ] self.class[key] = parse(string).map { |ast| ast.to_xpath(*args) } end # On CSS parser error, raise an exception def on_error error_token_id, error_value, value_stack after = value_stack.compact.last raise SyntaxError.new("unexpected '#{error_value}' after '#{after}'") end end end end nokogiri-1.6.1/lib/nokogiri/css/node.rb0000644000175000017500000000575512261213762017365 0ustar boutilboutilmodule Nokogiri module CSS class Node ALLOW_COMBINATOR_ON_SELF = [:DIRECT_ADJACENT_SELECTOR, :FOLLOWING_SELECTOR, :CHILD_SELECTOR] # Get the type of this node attr_accessor :type # Get the value of this node attr_accessor :value # Create a new Node with +type+ and +value+ def initialize type, value @type = type @value = value end # Accept +visitor+ def accept visitor visitor.send(:"visit_#{type.to_s.downcase}", self) end ### # Convert this CSS node to xpath with +prefix+ using +visitor+ def to_xpath prefix = '//', visitor = XPathVisitor.new self.preprocess! prefix = '.' if ALLOW_COMBINATOR_ON_SELF.include?(type) && value.first.nil? prefix + visitor.accept(self) end # Preprocess this node tree def preprocess! ### Deal with nth-child matches = find_by_type( [:CONDITIONAL_SELECTOR, [:ELEMENT_NAME], [:PSEUDO_CLASS, [:FUNCTION] ] ] ) matches.each do |match| if match.value[1].value[0].value[0] =~ /^nth-(last-)?child/ tag_name = match.value[0].value.first match.value[0].value = ['*'] match.value[1] = Node.new(:COMBINATOR, [ match.value[1].value[0], Node.new(:FUNCTION, ['self(', tag_name]) ]) end end ### Deal with first-child, last-child matches = find_by_type( [:CONDITIONAL_SELECTOR, [:ELEMENT_NAME], [:PSEUDO_CLASS] ]) matches.each do |match| if ['first-child', 'last-child'].include?(match.value[1].value.first) which = match.value[1].value.first.gsub(/-\w*$/, '') tag_name = match.value[0].value.first match.value[0].value = ['*'] match.value[1] = Node.new(:COMBINATOR, [ Node.new(:FUNCTION, ["#{which}("]), Node.new(:FUNCTION, ['self(', tag_name]) ]) elsif 'only-child' == match.value[1].value.first tag_name = match.value[0].value.first match.value[0].value = ['*'] match.value[1] = Node.new(:COMBINATOR, [ Node.new(:FUNCTION, ["#{match.value[1].value.first}("]), Node.new(:FUNCTION, ['self(', tag_name]) ]) end end self end # Find a node by type using +types+ def find_by_type types matches = [] matches << self if to_type == types @value.each do |v| matches += v.find_by_type(types) if v.respond_to?(:find_by_type) end matches end # Convert to_type def to_type [@type] + @value.map { |n| n.to_type if n.respond_to?(:to_type) }.compact end # Convert to array def to_a [@type] + @value.map { |n| n.respond_to?(:to_a) ? n.to_a : [n] } end end end end nokogiri-1.6.1/lib/nokogiri/css/parser.rb0000644000175000017500000004162012261213762017723 0ustar boutilboutil# # DO NOT MODIFY!!!! # This file is automatically generated by Racc 1.4.8 # from Racc grammer file "". # require 'racc/parser.rb' require 'nokogiri/css/parser_extras' module Nokogiri module CSS class Parser < Racc::Parser ##### State transition tables begin ### racc_action_table = [ 21, 4, 5, 7, 29, 4, 5, 7, 30, 19, -26, 6, 21, 9, 8, 6, 29, 9, 8, 22, 31, 19, 20, 21, 23, 15, 17, 29, 24, 83, 31, 22, 19, 84, 20, 21, 23, 15, 17, 29, 24, 92, 22, 85, 19, 20, 21, 23, 15, 17, 20, 24, 82, 90, 22, 59, 24, 20, 89, 23, 15, 17, 21, 24, 88, 22, 29, 4, 5, 7, 23, 19, 71, 29, 91, 29, 86, 6, 19, 9, 8, 22, 29, 29, 20, 89, 23, 15, 17, 35, 24, 20, 29, 20, 15, 17, 15, 24, 35, 24, 20, 20, 29, 15, 15, 93, 24, 24, 21, 64, 20, 95, 29, 15, 97, 96, 24, 43, -26, 46, 20, 52, 53, 15, 51, 98, 24, 22, 79, 80, 20, 99, 23, 15, 48, 42, 24, 79, 80, 75, 76, 77, 101, 78, 87, 86, 41, 74, 75, 76, 77, 35, 78, 104, 52, 56, 74, 55, 52, 56, 105, 55, 52, 56, nil, 55, 52, 56, nil, 55 ] racc_action_check = [ 0, 14, 14, 14, 0, 0, 0, 0, 1, 0, 43, 14, 40, 14, 14, 0, 40, 0, 0, 0, 1, 40, 0, 31, 0, 0, 0, 31, 0, 47, 57, 40, 31, 49, 40, 13, 40, 40, 40, 13, 40, 57, 31, 50, 13, 31, 24, 31, 31, 31, 11, 31, 46, 53, 13, 24, 11, 13, 53, 13, 13, 13, 23, 13, 52, 24, 23, 23, 23, 23, 24, 23, 42, 35, 54, 28, 55, 23, 35, 23, 23, 23, 27, 10, 23, 56, 23, 23, 23, 33, 23, 35, 26, 28, 35, 35, 28, 35, 10, 28, 27, 10, 25, 27, 10, 67, 27, 10, 20, 30, 26, 72, 68, 26, 73, 73, 26, 20, 19, 20, 25, 21, 21, 25, 21, 81, 25, 20, 45, 45, 68, 83, 20, 68, 21, 18, 68, 44, 44, 45, 45, 45, 87, 45, 51, 51, 15, 45, 44, 44, 44, 12, 44, 90, 89, 89, 44, 89, 86, 86, 101, 86, 88, 88, nil, 88, 22, 22, nil, 22 ] racc_action_pointer = [ -2, 8, nil, nil, nil, nil, nil, nil, nil, nil, 77, 26, 130, 33, -6, 135, nil, nil, 106, 89, 106, 111, 156, 60, 44, 96, 86, 76, 69, nil, 109, 21, nil, 68, nil, 67, nil, nil, nil, nil, 10, nil, 61, -19, 134, 125, 27, 0, nil, 10, 20, 133, 52, 46, 51, 64, 73, 18, nil, nil, nil, nil, nil, nil, nil, nil, nil, 82, 106, nil, nil, nil, 86, 104, nil, nil, nil, nil, nil, nil, nil, 100, nil, 120, nil, nil, 148, 135, 152, 144, 140, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, 147, nil, nil, nil, nil ] racc_action_default = [ -27, -74, -2, -3, -4, -5, -6, -7, -8, -9, -50, -13, -17, -27, -20, -74, -22, -23, -74, -25, -27, -74, -74, -27, -74, -55, -56, -57, -58, -59, -74, -27, -10, -49, -12, -27, -14, -15, -16, -18, -27, -21, -74, -32, -62, -62, -74, -74, -33, -74, -74, -41, -42, -43, -74, -41, -43, -74, -47, -48, -51, -52, -53, -54, 106, -1, -11, -74, -71, -73, -19, -24, -74, -74, -63, -64, -65, -66, -67, -68, -69, -74, -30, -74, -34, -35, -74, -46, -74, -74, -74, -36, -37, -70, -72, -28, -60, -61, -29, -31, -38, -74, -39, -40, -45, -44 ] racc_goto_table = [ 49, 54, 33, 39, 36, 1, 34, 45, 38, 72, 81, 58, 32, 37, 47, 44, 68, 60, 61, 62, 63, 65, 40, 50, 67, nil, nil, 69, 57, 66, 70, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, 94, nil, nil, nil, nil, 100, nil, 102, 103 ] racc_goto_check = [ 18, 18, 8, 2, 11, 1, 9, 10, 9, 17, 17, 10, 7, 12, 15, 16, 6, 8, 8, 8, 8, 2, 4, 19, 22, nil, nil, 8, 1, 9, 2, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, 8, nil, nil, nil, nil, 18, nil, 18, 18 ] racc_goto_pointer = [ nil, 5, -10, nil, 8, nil, -19, 2, -8, -4, -13, -7, 2, nil, nil, -6, -5, -35, -21, 2, nil, nil, -11 ] racc_goto_default = [ nil, nil, 3, 2, 13, 14, 10, nil, 12, nil, 11, 28, 27, 26, 16, 18, nil, nil, nil, nil, 25, 73, nil ] racc_reduce_table = [ 0, 0, :racc_error, 3, 32, :_reduce_1, 1, 32, :_reduce_2, 1, 32, :_reduce_3, 1, 35, :_reduce_4, 1, 35, :_reduce_5, 1, 35, :_reduce_6, 1, 35, :_reduce_7, 1, 35, :_reduce_8, 1, 35, :_reduce_9, 2, 36, :_reduce_10, 3, 36, :_reduce_11, 2, 36, :_reduce_12, 1, 36, :_reduce_none, 2, 36, :_reduce_14, 2, 36, :_reduce_15, 2, 36, :_reduce_16, 1, 36, :_reduce_17, 2, 34, :_reduce_18, 3, 33, :_reduce_19, 1, 33, :_reduce_none, 2, 44, :_reduce_21, 1, 37, :_reduce_none, 1, 37, :_reduce_23, 3, 45, :_reduce_24, 1, 45, :_reduce_25, 1, 46, :_reduce_26, 0, 46, :_reduce_none, 4, 43, :_reduce_28, 4, 43, :_reduce_29, 3, 43, :_reduce_30, 3, 47, :_reduce_31, 1, 47, :_reduce_32, 2, 41, :_reduce_33, 3, 41, :_reduce_34, 3, 41, :_reduce_35, 3, 41, :_reduce_36, 3, 41, :_reduce_37, 3, 49, :_reduce_38, 3, 49, :_reduce_39, 3, 49, :_reduce_40, 1, 49, :_reduce_none, 1, 49, :_reduce_none, 1, 49, :_reduce_43, 4, 50, :_reduce_44, 3, 50, :_reduce_45, 2, 50, :_reduce_46, 2, 42, :_reduce_47, 2, 42, :_reduce_48, 1, 38, :_reduce_none, 0, 38, :_reduce_none, 2, 39, :_reduce_51, 2, 39, :_reduce_52, 2, 39, :_reduce_53, 2, 39, :_reduce_54, 1, 39, :_reduce_none, 1, 39, :_reduce_none, 1, 39, :_reduce_none, 1, 39, :_reduce_none, 1, 51, :_reduce_59, 2, 48, :_reduce_60, 2, 48, :_reduce_61, 0, 48, :_reduce_none, 1, 52, :_reduce_63, 1, 52, :_reduce_64, 1, 52, :_reduce_65, 1, 52, :_reduce_66, 1, 52, :_reduce_67, 1, 52, :_reduce_68, 1, 52, :_reduce_69, 3, 40, :_reduce_70, 1, 53, :_reduce_none, 2, 53, :_reduce_none, 1, 53, :_reduce_none ] racc_reduce_n = 74 racc_shift_n = 106 racc_token_table = { false => 0, :error => 1, :FUNCTION => 2, :INCLUDES => 3, :DASHMATCH => 4, :LBRACE => 5, :HASH => 6, :PLUS => 7, :GREATER => 8, :S => 9, :STRING => 10, :IDENT => 11, :COMMA => 12, :NUMBER => 13, :PREFIXMATCH => 14, :SUFFIXMATCH => 15, :SUBSTRINGMATCH => 16, :TILDE => 17, :NOT_EQUAL => 18, :SLASH => 19, :DOUBLESLASH => 20, :NOT => 21, :EQUAL => 22, :RPAREN => 23, :LSQUARE => 24, :RSQUARE => 25, :HAS => 26, "." => 27, "*" => 28, "|" => 29, ":" => 30 } racc_nt_base = 31 racc_use_result_var = true Racc_arg = [ racc_action_table, racc_action_check, racc_action_default, racc_action_pointer, racc_goto_table, racc_goto_check, racc_goto_default, racc_goto_pointer, racc_nt_base, racc_reduce_table, racc_token_table, racc_shift_n, racc_reduce_n, racc_use_result_var ] Racc_token_to_s_table = [ "$end", "error", "FUNCTION", "INCLUDES", "DASHMATCH", "LBRACE", "HASH", "PLUS", "GREATER", "S", "STRING", "IDENT", "COMMA", "NUMBER", "PREFIXMATCH", "SUFFIXMATCH", "SUBSTRINGMATCH", "TILDE", "NOT_EQUAL", "SLASH", "DOUBLESLASH", "NOT", "EQUAL", "RPAREN", "LSQUARE", "RSQUARE", "HAS", "\".\"", "\"*\"", "\"|\"", "\":\"", "$start", "selector", "simple_selector_1toN", "prefixless_combinator_selector", "combinator", "simple_selector", "element_name", "hcap_0toN", "hcap_1toN", "negation", "function", "pseudo", "attrib", "class", "namespaced_ident", "namespace", "attrib_name", "attrib_val_0or1", "expr", "an_plus_b", "attribute_id", "eql_incl_dash", "negation_arg" ] Racc_debug_parser = false ##### State transition tables end ##### # reduce 0 omitted def _reduce_1(val, _values, result) result = [val.first, val.last].flatten result end def _reduce_2(val, _values, result) result = val.flatten result end def _reduce_3(val, _values, result) result = val.flatten result end def _reduce_4(val, _values, result) result = :DIRECT_ADJACENT_SELECTOR result end def _reduce_5(val, _values, result) result = :CHILD_SELECTOR result end def _reduce_6(val, _values, result) result = :FOLLOWING_SELECTOR result end def _reduce_7(val, _values, result) result = :DESCENDANT_SELECTOR result end def _reduce_8(val, _values, result) result = :DESCENDANT_SELECTOR result end def _reduce_9(val, _values, result) result = :CHILD_SELECTOR result end def _reduce_10(val, _values, result) result = if val[1].nil? val.first else Node.new(:CONDITIONAL_SELECTOR, [val.first, val[1]]) end result end def _reduce_11(val, _values, result) result = Node.new(:CONDITIONAL_SELECTOR, [ val.first, Node.new(:COMBINATOR, [val[1], val.last]) ] ) result end def _reduce_12(val, _values, result) result = Node.new(:CONDITIONAL_SELECTOR, val) result end # reduce 13 omitted def _reduce_14(val, _values, result) result = Node.new(:CONDITIONAL_SELECTOR, val) result end def _reduce_15(val, _values, result) result = Node.new(:CONDITIONAL_SELECTOR, val) result end def _reduce_16(val, _values, result) result = Node.new(:CONDITIONAL_SELECTOR, [ Node.new(:ELEMENT_NAME, ['*']), Node.new(:COMBINATOR, val) ] ) result end def _reduce_17(val, _values, result) result = Node.new(:CONDITIONAL_SELECTOR, [Node.new(:ELEMENT_NAME, ['*']), val.first] ) result end def _reduce_18(val, _values, result) result = Node.new(val.first, [nil, val.last]) result end def _reduce_19(val, _values, result) result = Node.new(val[1], [val.first, val.last]) result end # reduce 20 omitted def _reduce_21(val, _values, result) result = Node.new(:CLASS_CONDITION, [val[1]]) result end # reduce 22 omitted def _reduce_23(val, _values, result) result = Node.new(:ELEMENT_NAME, val) result end def _reduce_24(val, _values, result) result = Node.new(:ELEMENT_NAME, [[val.first, val.last].compact.join(':')] ) result end def _reduce_25(val, _values, result) name = @namespaces.key?('xmlns') ? "xmlns:#{val.first}" : val.first result = Node.new(:ELEMENT_NAME, [name]) result end def _reduce_26(val, _values, result) result = val[0] result end # reduce 27 omitted def _reduce_28(val, _values, result) result = Node.new(:ATTRIBUTE_CONDITION, [val[1]] + (val[2] || []) ) result end def _reduce_29(val, _values, result) result = Node.new(:ATTRIBUTE_CONDITION, [val[1]] + (val[2] || []) ) result end def _reduce_30(val, _values, result) # Non standard, but hpricot supports it. result = Node.new(:PSEUDO_CLASS, [Node.new(:FUNCTION, ['nth-child(', val[1]])] ) result end def _reduce_31(val, _values, result) result = Node.new(:ELEMENT_NAME, [[val.first, val.last].compact.join(':')] ) result end def _reduce_32(val, _values, result) # Default namespace is not applied to attributes. # So we don't add prefix "xmlns:" as in namespaced_ident. result = Node.new(:ELEMENT_NAME, [val.first]) result end def _reduce_33(val, _values, result) result = Node.new(:FUNCTION, [val.first.strip]) result end def _reduce_34(val, _values, result) result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) result end def _reduce_35(val, _values, result) result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) result end def _reduce_36(val, _values, result) result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) result end def _reduce_37(val, _values, result) result = Node.new(:FUNCTION, [val.first.strip, val[1]].flatten) result end def _reduce_38(val, _values, result) result = [val.first, val.last] result end def _reduce_39(val, _values, result) result = [val.first, val.last] result end def _reduce_40(val, _values, result) result = [val.first, val.last] result end # reduce 41 omitted # reduce 42 omitted def _reduce_43(val, _values, result) if val[0] == 'even' val = ["2","n","+","0"] result = Node.new(:AN_PLUS_B, val) elsif val[0] == 'odd' val = ["2","n","+","1"] result = Node.new(:AN_PLUS_B, val) else # This is not CSS standard. It allows us to support this: # assert_xpath("//a[foo(., @href)]", @parser.parse('a:foo(@href)')) # assert_xpath("//a[foo(., @a, b)]", @parser.parse('a:foo(@a, b)')) # assert_xpath("//a[foo(., a, 10)]", @parser.parse('a:foo(a, 10)')) result = val end result end def _reduce_44(val, _values, result) if val[1] == 'n' result = Node.new(:AN_PLUS_B, val) else raise Racc::ParseError, "parse error on IDENT '#{val[1]}'" end result end def _reduce_45(val, _values, result) # n+3, -n+3 if val[0] == 'n' val.unshift("1") result = Node.new(:AN_PLUS_B, val) elsif val[0] == '-n' val[0] = 'n' val.unshift("-1") result = Node.new(:AN_PLUS_B, val) else raise Racc::ParseError, "parse error on IDENT '#{val[1]}'" end result end def _reduce_46(val, _values, result) if val[1] == 'n' val << "+" val << "0" result = Node.new(:AN_PLUS_B, val) else raise Racc::ParseError, "parse error on IDENT '#{val[1]}'" end result end def _reduce_47(val, _values, result) result = Node.new(:PSEUDO_CLASS, [val[1]]) result end def _reduce_48(val, _values, result) result = Node.new(:PSEUDO_CLASS, [val[1]]) result end # reduce 49 omitted # reduce 50 omitted def _reduce_51(val, _values, result) result = Node.new(:COMBINATOR, val) result end def _reduce_52(val, _values, result) result = Node.new(:COMBINATOR, val) result end def _reduce_53(val, _values, result) result = Node.new(:COMBINATOR, val) result end def _reduce_54(val, _values, result) result = Node.new(:COMBINATOR, val) result end # reduce 55 omitted # reduce 56 omitted # reduce 57 omitted # reduce 58 omitted def _reduce_59(val, _values, result) result = Node.new(:ID, val) result end def _reduce_60(val, _values, result) result = [val.first, val[1]] result end def _reduce_61(val, _values, result) result = [val.first, val[1]] result end # reduce 62 omitted def _reduce_63(val, _values, result) result = :equal result end def _reduce_64(val, _values, result) result = :prefix_match result end def _reduce_65(val, _values, result) result = :suffix_match result end def _reduce_66(val, _values, result) result = :substring_match result end def _reduce_67(val, _values, result) result = :not_equal result end def _reduce_68(val, _values, result) result = :includes result end def _reduce_69(val, _values, result) result = :dash_match result end def _reduce_70(val, _values, result) result = Node.new(:NOT, [val[1]]) result end # reduce 71 omitted # reduce 72 omitted # reduce 73 omitted def _reduce_none(val, _values, result) val[0] end end # class Parser end # module CSS end # module Nokogiri nokogiri-1.6.1/lib/nokogiri/css/tokenizer.rex0000644000175000017500000000361312261213762020634 0ustar boutilboutilmodule Nokogiri module CSS class Tokenizer # :nodoc: macro nl \n|\r\n|\r|\f w [\s]* nonascii [^\0-\177] num -?([0-9]+|[0-9]*\.[0-9]+) unicode \\[0-9A-Fa-f]{1,6}(\r\n|[\s])? escape {unicode}|\\[^\n\r\f0-9A-Fa-f] nmchar [_A-Za-z0-9-]|{nonascii}|{escape} nmstart [_A-Za-z]|{nonascii}|{escape} ident [-@]?({nmstart})({nmchar})* name ({nmchar})+ string1 "([^\n\r\f"]|{nl}|{nonascii}|{escape})*" string2 '([^\n\r\f']|{nl}|{nonascii}|{escape})*' string {string1}|{string2} rule # [:state] pattern [actions] has\({w} { [:HAS, text] } {ident}\({w} { [:FUNCTION, text] } {ident} { [:IDENT, text] } \#{name} { [:HASH, text] } {w}~={w} { [:INCLUDES, text] } {w}\|={w} { [:DASHMATCH, text] } {w}\^={w} { [:PREFIXMATCH, text] } {w}\$={w} { [:SUFFIXMATCH, text] } {w}\*={w} { [:SUBSTRINGMATCH, text] } {w}!={w} { [:NOT_EQUAL, text] } {w}={w} { [:EQUAL, text] } {w}\) { [:RPAREN, text] } {w}\[{w} { [:LSQUARE, text] } {w}\] { [:RSQUARE, text] } {w}\+{w} { [:PLUS, text] } {w}>{w} { [:GREATER, text] } {w},{w} { [:COMMA, text] } {w}~{w} { [:TILDE, text] } \:not\({w} { [:NOT, text] } {num} { [:NUMBER, text] } {w}\/\/{w} { [:DOUBLESLASH, text] } {w}\/{w} { [:SLASH, text] } U\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})? {[:UNICODE_RANGE, text] } [\s]+ { [:S, text] } {string} { [:STRING, text] } . { [text, text] } end end end nokogiri-1.6.1/lib/nokogiri/css/syntax_error.rb0000644000175000017500000000017712261213762021170 0ustar boutilboutilrequire 'nokogiri/syntax_error' module Nokogiri module CSS class SyntaxError < ::Nokogiri::SyntaxError end end end nokogiri-1.6.1/lib/nokogiri/css/tokenizer.rb0000644000175000017500000000761412261213762020446 0ustar boutilboutil#-- # DO NOT MODIFY!!!! # This file is automatically generated by rex 1.0.5 # from lexical definition file "lib/nokogiri/css/tokenizer.rex". #++ module Nokogiri module CSS class Tokenizer # :nodoc: require 'strscan' class ScanError < StandardError ; end attr_reader :lineno attr_reader :filename attr_accessor :state def scan_setup(str) @ss = StringScanner.new(str) @lineno = 1 @state = nil end def action yield end def scan_str(str) scan_setup(str) do_parse end alias :scan :scan_str def load_file( filename ) @filename = filename open(filename, "r") do |f| scan_setup(f.read) end end def scan_file( filename ) load_file(filename) do_parse end def next_token return if @ss.eos? # skips empty actions until token = _next_token or @ss.eos?; end token end def _next_token text = @ss.peek(1) @lineno += 1 if text == "\n" token = case @state when nil case when (text = @ss.scan(/has\([\s]*/)) action { [:HAS, text] } when (text = @ss.scan(/[-@]?([_A-Za-z]|[^\0-\177]|\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f])([_A-Za-z0-9-]|[^\0-\177]|\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f])*\([\s]*/)) action { [:FUNCTION, text] } when (text = @ss.scan(/[-@]?([_A-Za-z]|[^\0-\177]|\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f])([_A-Za-z0-9-]|[^\0-\177]|\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f])*/)) action { [:IDENT, text] } when (text = @ss.scan(/\#([_A-Za-z0-9-]|[^\0-\177]|\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f])+/)) action { [:HASH, text] } when (text = @ss.scan(/[\s]*~=[\s]*/)) action { [:INCLUDES, text] } when (text = @ss.scan(/[\s]*\|=[\s]*/)) action { [:DASHMATCH, text] } when (text = @ss.scan(/[\s]*\^=[\s]*/)) action { [:PREFIXMATCH, text] } when (text = @ss.scan(/[\s]*\$=[\s]*/)) action { [:SUFFIXMATCH, text] } when (text = @ss.scan(/[\s]*\*=[\s]*/)) action { [:SUBSTRINGMATCH, text] } when (text = @ss.scan(/[\s]*!=[\s]*/)) action { [:NOT_EQUAL, text] } when (text = @ss.scan(/[\s]*=[\s]*/)) action { [:EQUAL, text] } when (text = @ss.scan(/[\s]*\)/)) action { [:RPAREN, text] } when (text = @ss.scan(/[\s]*\[[\s]*/)) action { [:LSQUARE, text] } when (text = @ss.scan(/[\s]*\]/)) action { [:RSQUARE, text] } when (text = @ss.scan(/[\s]*\+[\s]*/)) action { [:PLUS, text] } when (text = @ss.scan(/[\s]*>[\s]*/)) action { [:GREATER, text] } when (text = @ss.scan(/[\s]*,[\s]*/)) action { [:COMMA, text] } when (text = @ss.scan(/[\s]*~[\s]*/)) action { [:TILDE, text] } when (text = @ss.scan(/\:not\([\s]*/)) action { [:NOT, text] } when (text = @ss.scan(/-?([0-9]+|[0-9]*\.[0-9]+)/)) action { [:NUMBER, text] } when (text = @ss.scan(/[\s]*\/\/[\s]*/)) action { [:DOUBLESLASH, text] } when (text = @ss.scan(/[\s]*\/[\s]*/)) action { [:SLASH, text] } when (text = @ss.scan(/U\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?/)) action {[:UNICODE_RANGE, text] } when (text = @ss.scan(/[\s]+/)) action { [:S, text] } when (text = @ss.scan(/"([^\n\r\f"]|\n|\r\n|\r|\f|[^\0-\177]|\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f])*"|'([^\n\r\f']|\n|\r\n|\r|\f|[^\0-\177]|\\[0-9A-Fa-f]{1,6}(\r\n|[\s])?|\\[^\n\r\f0-9A-Fa-f])*'/)) action { [:STRING, text] } when (text = @ss.scan(/./)) action { [text, text] } else text = @ss.string[@ss.pos .. -1] raise ScanError, "can not match: '" + text + "'" end # if else raise ScanError, "undefined state: '" + state.to_s + "'" end # case state token end # def _next_token end # class end end nokogiri-1.6.1/lib/nokogiri/html/0000755000175000017500000000000012261213762016253 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/html/builder.rb0000644000175000017500000000201012261213762020217 0ustar boutilboutilmodule Nokogiri module HTML ### # Nokogiri HTML builder is used for building HTML documents. It is very # similar to the Nokogiri::XML::Builder. In fact, you should go read the # documentation for Nokogiri::XML::Builder before reading this # documentation. # # == Synopsis: # # Create an HTML document with a body that has an onload attribute, and a # span tag with a class of "bold" that has content of "Hello world". # # builder = Nokogiri::HTML::Builder.new do |doc| # doc.html { # doc.body(:onload => 'some_func();') { # doc.span.bold { # doc.text "Hello world" # } # } # } # end # puts builder.to_html # # The HTML builder inherits from the XML builder, so make sure to read the # Nokogiri::XML::Builder documentation. class Builder < Nokogiri::XML::Builder ### # Convert the builder to HTML def to_html @doc.to_html end end end end nokogiri-1.6.1/lib/nokogiri/html/element_description.rb0000644000175000017500000000063512261213762022640 0ustar boutilboutilmodule Nokogiri module HTML class ElementDescription ### # Is this element a block element? def block? !inline? end ### # Convert this description to a string def to_s "#{name}: #{description}" end ### # Inspection information def inspect "#<#{self.class.name}: #{name} #{description}>" end end end end nokogiri-1.6.1/lib/nokogiri/html/entity_lookup.rb0000644000175000017500000000040312261213762021502 0ustar boutilboutilmodule Nokogiri module HTML class EntityDescription < Struct.new(:value, :name, :description); end class EntityLookup ### # Look up entity with +name+ def [] name (val = get(name)) && val.value end end end end nokogiri-1.6.1/lib/nokogiri/html/sax/0000755000175000017500000000000012261213762017046 5ustar boutilboutilnokogiri-1.6.1/lib/nokogiri/html/sax/parser.rb0000644000175000017500000000320112261213762020663 0ustar boutilboutilmodule Nokogiri module HTML ### # Nokogiri lets you write a SAX parser to process HTML but get HTML # correction features. # # See Nokogiri::HTML::SAX::Parser for a basic example of using a # SAX parser with HTML. # # For more information on SAX parsers, see Nokogiri::XML::SAX module SAX ### # This class lets you perform SAX style parsing on HTML with HTML # error correction. # # Here is a basic usage example: # # class MyDoc < Nokogiri::XML::SAX::Document # def start_element name, attributes = [] # puts "found a #{name}" # end # end # # parser = Nokogiri::HTML::SAX::Parser.new(MyDoc.new) # parser.parse(File.read(ARGV[0], 'rb')) # # For more information on SAX parsers, see Nokogiri::XML::SAX class Parser < Nokogiri::XML::SAX::Parser ### # Parse html stored in +data+ using +encoding+ def parse_memory data, encoding = 'UTF-8' raise ArgumentError unless data return unless data.length > 0 ctx = ParserContext.memory(data, encoding) yield ctx if block_given? ctx.parse_with self end ### # Parse a file with +filename+ def parse_file filename, encoding = 'UTF-8' raise ArgumentError unless filename raise Errno::ENOENT unless File.exists?(filename) raise Errno::EISDIR if File.directory?(filename) ctx = ParserContext.file(filename, encoding) yield ctx if block_given? ctx.parse_with self end end end end end nokogiri-1.6.1/lib/nokogiri/html/sax/parser_context.rb0000644000175000017500000000074112261213762022435 0ustar boutilboutilmodule Nokogiri module HTML module SAX ### # Context for HTML SAX parsers. This class is usually not instantiated # by the user. Instead, you should be looking at # Nokogiri::HTML::SAX::Parser class ParserContext < Nokogiri::XML::SAX::ParserContext def self.new thing, encoding = 'UTF-8' [:read, :close].all? { |x| thing.respond_to?(x) } ? super : memory(thing, encoding) end end end end end nokogiri-1.6.1/lib/nokogiri/html/sax/push_parser.rb0000644000175000017500000000065112261213762021730 0ustar boutilboutilmodule Nokogiri module HTML module SAX class PushParser def initialize(doc = XML::SAX::Document.new, file_name = nil, encoding = 'UTF-8') @document = doc @encoding = encoding @sax_parser = HTML::SAX::Parser.new(doc, @encoding) ## Create our push parser context initialize_native(@sax_parser, file_name, @encoding) end end end end end nokogiri-1.6.1/lib/nokogiri/html/document.rb0000644000175000017500000002050512261213762020420 0ustar boutilboutilmodule Nokogiri module HTML class Document < Nokogiri::XML::Document ### # Get the meta tag encoding for this document. If there is no meta tag, # then nil is returned. def meta_encoding meta = meta_content_type and match = /charset\s*=\s*([\w-]+)/i.match(meta['content']) and match[1] end ### # Set the meta tag encoding for this document. If there is no meta # content tag, the encoding is not set. def meta_encoding= encoding meta = meta_content_type and meta['content'] = "text/html; charset=%s" % encoding end def meta_content_type css('meta[@http-equiv]').find { |node| node['http-equiv'] =~ /\AContent-Type\z/i and !node['content'].nil? and !node['content'].empty? } end private :meta_content_type ### # Get the title string of this document. Return nil if there is # no title tag. def title title = at('title') and title.inner_text end ### # Set the title string of this document. If there is no head # element, the title is not set. def title=(text) unless title = at('title') head = at('head') or return nil title = Nokogiri::XML::Node.new('title', self) head << title end title.children = XML::Text.new(text, self) end #### # Serialize Node using +options+. Save options can also be set using a # block. See SaveOptions. # # These two statements are equivalent: # # node.serialize(:encoding => 'UTF-8', :save_with => FORMAT | AS_XML) # # or # # node.serialize(:encoding => 'UTF-8') do |config| # config.format.as_xml # end # def serialize options = {} options[:save_with] ||= XML::Node::SaveOptions::DEFAULT_HTML super end #### # Create a Nokogiri::XML::DocumentFragment from +tags+ def fragment tags = nil DocumentFragment.new(self, tags, self.root) end class << self ### # Parse HTML. +string_or_io+ may be a String, or any object that # responds to _read_ and _close_ such as an IO, or StringIO. # +url+ is resource where this document is located. +encoding+ is the # encoding that should be used when processing the document. +options+ # is a number that sets options in the parser, such as # Nokogiri::XML::ParseOptions::RECOVER. See the constants in # Nokogiri::XML::ParseOptions. def parse string_or_io, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML options = Nokogiri::XML::ParseOptions.new(options) if Fixnum === options # Give the options to the user yield options if block_given? if string_or_io.respond_to?(:encoding) unless string_or_io.encoding.name == "ASCII-8BIT" encoding ||= string_or_io.encoding.name end end if string_or_io.respond_to?(:read) url ||= string_or_io.respond_to?(:path) ? string_or_io.path : nil if !encoding # Libxml2's parser has poor support for encoding # detection. First, it does not recognize the HTML5 # style meta charset declaration. Secondly, even if it # successfully detects an encoding hint, it does not # re-decode or re-parse the preceding part which may be # garbled. # # EncodingReader aims to perform advanced encoding # detection beyond what Libxml2 does, and to emulate # rewinding of a stream and make Libxml2 redo parsing # from the start when an encoding hint is found. string_or_io = EncodingReader.new(string_or_io) begin return read_io(string_or_io, url, encoding, options.to_i) rescue EncodingFound => e encoding = e.found_encoding end end return read_io(string_or_io, url, encoding, options.to_i) end # read_memory pukes on empty docs return new if string_or_io.nil? or string_or_io.empty? encoding ||= EncodingReader.detect_encoding(string_or_io) read_memory(string_or_io, url, encoding, options.to_i) end end class EncodingFound < StandardError # :nodoc: attr_reader :found_encoding def initialize(encoding) @found_encoding = encoding super("encoding found: %s" % encoding) end end class EncodingReader # :nodoc: class SAXHandler < Nokogiri::XML::SAX::Document # :nodoc: attr_reader :encoding def initialize @encoding = nil super() end def start_element(name, attrs = []) return unless name == 'meta' attr = Hash[attrs] charset = attr['charset'] and @encoding = charset http_equiv = attr['http-equiv'] and http_equiv.match(/\AContent-Type\z/i) and content = attr['content'] and m = content.match(/;\s*charset\s*=\s*([\w-]+)/) and @encoding = m[1] end end class JumpSAXHandler < SAXHandler def initialize(jumptag) @jumptag = jumptag super() end def start_element(name, attrs = []) super throw @jumptag, @encoding if @encoding throw @jumptag, nil if name =~ /\A(?:div|h1|img|p|br)\z/ end end def self.detect_encoding(chunk) if Nokogiri.jruby? && EncodingReader.is_jruby_without_fix? return EncodingReader.detect_encoding_for_jruby_without_fix(chunk) end m = chunk.match(/\A(<\?xml[ \t\r\n]+[^>]*>)/) and return Nokogiri.XML(m[1]).encoding if Nokogiri.jruby? m = chunk.match(/(]*>)/) and return Nokogiri.XML(m[1]).encoding m = chunk.match(/( 0 rest = @io.read(len) and ret << rest end if ret.empty? nil else ret end end end end end end nokogiri-1.6.1/lib/nokogiri/html/element_description_defaults.rb0000644000175000017500000006271312261213762024534 0ustar boutilboutilmodule Nokogiri module HTML class ElementDescription # Methods are defined protected by method_defined? because at # this point the C-library or Java library is already loaded, # and we don't want to clobber any methods that have been # defined there. Desc = Struct.new("HTMLElementDescription", :name, :startTag, :endTag, :saveEndTag, :empty, :depr, :dtd, :isinline, :desc, :subelts, :defaultsubelt, :attrs_opt, :attrs_depr, :attrs_req) # This is filled in down below. DefaultDescriptions = Hash.new() def default_desc DefaultDescriptions[name.downcase] end private :default_desc unless method_defined? :implied_start_tag? def implied_start_tag? d = default_desc d ? d.startTag : nil end end unless method_defined? :implied_end_tag? def implied_end_tag? d = default_desc d ? d.endTag : nil end end unless method_defined? :save_end_tag? def save_end_tag? d = default_desc d ? d.saveEndTag : nil end end unless method_defined? :deprecated? def deprecated? d = default_desc d ? d.depr : nil end end unless method_defined? :description def description d = default_desc d ? d.desc : nil end end unless method_defined? :default_sub_element def default_sub_element d = default_desc d ? d.defaultsubelt : nil end end unless method_defined? :optional_attributes def optional_attributes d = default_desc d ? d.attrs_opt : [] end end unless method_defined? :deprecated_attributes def deprecated_attributes d = default_desc d ? d.attrs_depr : [] end end unless method_defined? :required_attributes def required_attributes d = default_desc d ? d.attrs_req : [] end end ### # Default Element Descriptions (HTML 4.0) copied from # libxml2/HTMLparser.c and libxml2/include/libxml/HTMLparser.h # # The copyright notice for those files and the following list of # element and attribute descriptions is reproduced here: # # Except where otherwise noted in the source code (e.g. the # files hash.c, list.c and the trio files, which are covered by # a similar licence but with different Copyright notices) all # the files are: # # Copyright (C) 1998-2003 Daniel Veillard. All Rights Reserved. # # Permission is hereby granted, free of charge, to any person # obtaining a copy of this software and associated documentation # files (the "Software"), to deal in the Software without # restriction, including without limitation the rights to use, # copy, modify, merge, publish, distribute, sublicense, and/or # sell copies of the Software, and to permit persons to whom the # Software is fur- nished to do so, subject to the following # conditions: # The above copyright notice and this permission notice shall be # included in all copies or substantial portions of the # Software. # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY # KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE # WARRANTIES OF MERCHANTABILITY, FIT- NESS FOR A PARTICULAR # PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE DANIEL # VEILLARD BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, # WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING # FROM, OUT OF OR IN CON- NECTION WITH THE SOFTWARE OR THE USE # OR OTHER DEALINGS IN THE SOFTWARE. # Except as contained in this notice, the name of Daniel # Veillard shall not be used in advertising or otherwise to # promote the sale, use or other deal- ings in this Software # without prior written authorization from him. # Attributes defined and categorized FONTSTYLE = ["tt", "i", "b", "u", "s", "strike", "big", "small"] PHRASE = ['em', 'strong', 'dfn', 'code', 'samp', 'kbd', 'var', 'cite', 'abbr', 'acronym'] SPECIAL = ['a', 'img', 'applet', 'embed', 'object', 'font','basefont', 'br', 'script', 'map', 'q', 'sub', 'sup', 'span', 'bdo', 'iframe'] PCDATA = [] HEADING = ['h1', 'h2', 'h3', 'h4', 'h5', 'h6'] LIST = ['ul', 'ol', 'dir', 'menu'] FORMCTRL = ['input', 'select', 'textarea', 'label', 'button'] BLOCK = [HEADING, LIST, 'pre', 'p', 'dl', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'form', 'isindex', 'hr', 'table', 'fieldset', 'address'] INLINE = [PCDATA, FONTSTYLE, PHRASE, SPECIAL, FORMCTRL] FLOW = [BLOCK, INLINE] MODIFIER = [] EMPTY = [] HTML_FLOW = FLOW HTML_INLINE = INLINE HTML_PCDATA = PCDATA HTML_CDATA = HTML_PCDATA COREATTRS = ['id', 'class', 'style', 'title'] I18N = ['lang', 'dir'] EVENTS = ['onclick', 'ondblclick', 'onmousedown', 'onmouseup', 'onmouseover', 'onmouseout', 'onkeypress', 'onkeydown', 'onkeyup'] ATTRS = [COREATTRS, I18N,EVENTS] CELLHALIGN = ['align', 'char', 'charoff'] CELLVALIGN = ['valign'] HTML_ATTRS = ATTRS CORE_I18N_ATTRS = [COREATTRS, I18N] CORE_ATTRS = COREATTRS I18N_ATTRS = I18N A_ATTRS = [ATTRS, 'charset', 'type', 'name', 'href', 'hreflang', 'rel', 'rev', 'accesskey', 'shape', 'coords', 'tabindex', 'onfocus', 'onblur'] TARGET_ATTR = ['target'] ROWS_COLS_ATTR = ['rows', 'cols'] ALT_ATTR = ['alt'] SRC_ALT_ATTRS = ['src', 'alt'] HREF_ATTRS = ['href'] CLEAR_ATTRS = ['clear'] INLINE_P = [INLINE, 'p'] FLOW_PARAM = [FLOW, 'param'] APPLET_ATTRS = [COREATTRS , 'codebase', 'archive', 'alt', 'name', 'height', 'width', 'align', 'hspace', 'vspace'] AREA_ATTRS = ['shape', 'coords', 'href', 'nohref', 'tabindex', 'accesskey', 'onfocus', 'onblur'] BASEFONT_ATTRS = ['id', 'size', 'color', 'face'] QUOTE_ATTRS = [ATTRS, 'cite'] BODY_CONTENTS = [FLOW, 'ins', 'del'] BODY_ATTRS = [ATTRS, 'onload', 'onunload'] BODY_DEPR = ['background', 'bgcolor', 'text', 'link', 'vlink', 'alink'] BUTTON_ATTRS = [ATTRS, 'name', 'value', 'type', 'disabled', 'tabindex', 'accesskey', 'onfocus', 'onblur'] COL_ATTRS = [ATTRS, 'span', 'width', CELLHALIGN, CELLVALIGN] COL_ELT = ['col'] EDIT_ATTRS = [ATTRS, 'datetime', 'cite'] COMPACT_ATTRS = [ATTRS, 'compact'] DL_CONTENTS = ['dt', 'dd'] COMPACT_ATTR = ['compact'] LABEL_ATTR = ['label'] FIELDSET_CONTENTS = [FLOW, 'legend' ] FONT_ATTRS = [COREATTRS, I18N, 'size', 'color', 'face' ] FORM_CONTENTS = [HEADING, LIST, INLINE, 'pre', 'p', 'div', 'center', 'noscript', 'noframes', 'blockquote', 'isindex', 'hr', 'table', 'fieldset', 'address'] FORM_ATTRS = [ATTRS, 'method', 'enctype', 'accept', 'name', 'onsubmit', 'onreset', 'accept-charset'] FRAME_ATTRS = [COREATTRS, 'longdesc', 'name', 'src', 'frameborder', 'marginwidth', 'marginheight', 'noresize', 'scrolling' ] FRAMESET_ATTRS = [COREATTRS, 'rows', 'cols', 'onload', 'onunload'] FRAMESET_CONTENTS = ['frameset', 'frame', 'noframes'] HEAD_ATTRS = [I18N, 'profile'] HEAD_CONTENTS = ['title', 'isindex', 'base', 'script', 'style', 'meta', 'link', 'object'] HR_DEPR = ['align', 'noshade', 'size', 'width'] VERSION_ATTR = ['version'] HTML_CONTENT = ['head', 'body', 'frameset'] IFRAME_ATTRS = [COREATTRS, 'longdesc', 'name', 'src', 'frameborder', 'marginwidth', 'marginheight', 'scrolling', 'align', 'height', 'width'] IMG_ATTRS = [ATTRS, 'longdesc', 'name', 'height', 'width', 'usemap', 'ismap'] EMBED_ATTRS = [COREATTRS, 'align', 'alt', 'border', 'code', 'codebase', 'frameborder', 'height', 'hidden', 'hspace', 'name', 'palette', 'pluginspace', 'pluginurl', 'src', 'type', 'units', 'vspace', 'width'] INPUT_ATTRS = [ATTRS, 'type', 'name', 'value', 'checked', 'disabled', 'readonly', 'size', 'maxlength', 'src', 'alt', 'usemap', 'ismap', 'tabindex', 'accesskey', 'onfocus', 'onblur', 'onselect', 'onchange', 'accept'] PROMPT_ATTRS = [COREATTRS, I18N, 'prompt'] LABEL_ATTRS = [ATTRS, 'for', 'accesskey', 'onfocus', 'onblur'] LEGEND_ATTRS = [ATTRS, 'accesskey'] ALIGN_ATTR = ['align'] LINK_ATTRS = [ATTRS, 'charset', 'href', 'hreflang', 'type', 'rel', 'rev', 'media'] MAP_CONTENTS = [BLOCK, 'area'] NAME_ATTR = ['name'] ACTION_ATTR = ['action'] BLOCKLI_ELT = [BLOCK, 'li'] META_ATTRS = [I18N, 'http-equiv', 'name', 'scheme'] CONTENT_ATTR = ['content'] TYPE_ATTR = ['type'] NOFRAMES_CONTENT = ['body', FLOW, MODIFIER] OBJECT_CONTENTS = [FLOW, 'param'] OBJECT_ATTRS = [ATTRS, 'declare', 'classid', 'codebase', 'data', 'type', 'codetype', 'archive', 'standby', 'height', 'width', 'usemap', 'name', 'tabindex'] OBJECT_DEPR = ['align', 'border', 'hspace', 'vspace'] OL_ATTRS = ['type', 'compact', 'start'] OPTION_ELT = ['option'] OPTGROUP_ATTRS = [ATTRS, 'disabled'] OPTION_ATTRS = [ATTRS, 'disabled', 'label', 'selected', 'value'] PARAM_ATTRS = ['id', 'value', 'valuetype', 'type'] WIDTH_ATTR = ['width'] PRE_CONTENT = [PHRASE, 'tt', 'i', 'b', 'u', 's', 'strike', 'a', 'br', 'script', 'map', 'q', 'span', 'bdo', 'iframe'] SCRIPT_ATTRS = ['charset', 'src', 'defer', 'event', 'for'] LANGUAGE_ATTR = ['language'] SELECT_CONTENT = ['optgroup', 'option'] SELECT_ATTRS = [ATTRS, 'name', 'size', 'multiple', 'disabled', 'tabindex', 'onfocus', 'onblur', 'onchange'] STYLE_ATTRS = [I18N, 'media', 'title'] TABLE_ATTRS = [ATTRS, 'summary', 'width', 'border', 'frame', 'rules', 'cellspacing', 'cellpadding', 'datapagesize'] TABLE_DEPR = ['align', 'bgcolor'] TABLE_CONTENTS = ['caption', 'col', 'colgroup', 'thead', 'tfoot', 'tbody', 'tr'] TR_ELT = ['tr'] TALIGN_ATTRS = [ATTRS, CELLHALIGN, CELLVALIGN] TH_TD_DEPR = ['nowrap', 'bgcolor', 'width', 'height'] TH_TD_ATTR = [ATTRS, 'abbr', 'axis', 'headers', 'scope', 'rowspan', 'colspan', CELLHALIGN, CELLVALIGN] TEXTAREA_ATTRS = [ATTRS, 'name', 'disabled', 'readonly', 'tabindex', 'accesskey', 'onfocus', 'onblur', 'onselect', 'onchange'] TR_CONTENTS = ['th', 'td'] BGCOLOR_ATTR = ['bgcolor'] LI_ELT = ['li'] UL_DEPR = ['type', 'compact'] DIR_ATTR = ['dir'] [ ['a', false, false, false, false, false, :any, true, 'anchor ', HTML_INLINE, nil, A_ATTRS, TARGET_ATTR, [] ], ['abbr', false, false, false, false, false, :any, true, 'abbreviated form', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['acronym', false, false, false, false, false, :any, true, '', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['address', false, false, false, false, false, :any, false, 'information on author', INLINE_P , nil, HTML_ATTRS, [], [] ], ['applet', false, false, false, false, true, :loose, true, 'java applet ', FLOW_PARAM, nil, [], APPLET_ATTRS, [] ], ['area', false, true, true, true, false, :any, false, 'client-side image map area ', EMPTY, nil, AREA_ATTRS, TARGET_ATTR, ALT_ATTR ], ['b', false, true, false, false, false, :any, true, 'bold text style', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['base', false, true, true, true, false, :any, false, 'document base uri ', EMPTY, nil, [], TARGET_ATTR, HREF_ATTRS ], ['basefont', false, true, true, true, true, :loose, true, 'base font size ', EMPTY, nil, [], BASEFONT_ATTRS, [] ], ['bdo', false, false, false, false, false, :any, true, 'i18n bidi over-ride ', HTML_INLINE, nil, CORE_I18N_ATTRS, [], DIR_ATTR ], ['big', false, true, false, false, false, :any, true, 'large text style', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['blockquote', false, false, false, false, false, :any, false, 'long quotation ', HTML_FLOW, nil, QUOTE_ATTRS, [], [] ], ['body', true, true, false, false, false, :any, false, 'document body ', BODY_CONTENTS, 'div', BODY_ATTRS, BODY_DEPR, [] ], ['br', false, true, true, true, false, :any, true, 'forced line break ', EMPTY, nil, CORE_ATTRS, CLEAR_ATTRS, [] ], ['button', false, false, false, false, false, :any, true, 'push button ', [HTML_FLOW, MODIFIER], nil, BUTTON_ATTRS, [], [] ], ['caption', false, false, false, false, false, :any, false, 'table caption ', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['center', false, true, false, false, true, :loose, false, 'shorthand for div align=center ', HTML_FLOW, nil, [], HTML_ATTRS, [] ], ['cite', false, false, false, false, false, :any, true, 'citation', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['code', false, false, false, false, false, :any, true, 'computer code fragment', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['col', false, true, true, true, false, :any, false, 'table column ', EMPTY, nil, COL_ATTRS, [], [] ], ['colgroup', false, true, false, false, false, :any, false, 'table column group ', COL_ELT, 'col', COL_ATTRS, [], [] ], ['dd', false, true, false, false, false, :any, false, 'definition description ', HTML_FLOW, nil, HTML_ATTRS, [], [] ], ['del', false, false, false, false, false, :any, true, 'deleted text ', HTML_FLOW, nil, EDIT_ATTRS, [], [] ], ['dfn', false, false, false, false, false, :any, true, 'instance definition', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['dir', false, false, false, false, true, :loose, false, 'directory list', BLOCKLI_ELT, 'li', [], COMPACT_ATTRS, [] ], ['div', false, false, false, false, false, :any, false, 'generic language/style container', HTML_FLOW, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['dl', false, false, false, false, false, :any, false, 'definition list ', DL_CONTENTS, 'dd', HTML_ATTRS, COMPACT_ATTR, [] ], ['dt', false, true, false, false, false, :any, false, 'definition term ', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['em', false, true, false, false, false, :any, true, 'emphasis', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['embed', false, true, false, false, true, :loose, true, 'generic embedded object ', EMPTY, nil, EMBED_ATTRS, [], [] ], ['fieldset', false, false, false, false, false, :any, false, 'form control group ', FIELDSET_CONTENTS, nil, HTML_ATTRS, [], [] ], ['font', false, true, false, false, true, :loose, true, 'local change to font ', HTML_INLINE, nil, [], FONT_ATTRS, [] ], ['form', false, false, false, false, false, :any, false, 'interactive form ', FORM_CONTENTS, 'fieldset', FORM_ATTRS, TARGET_ATTR, ACTION_ATTR ], ['frame', false, true, true, true, false, :frameset, false, 'subwindow ', EMPTY, nil, [], FRAME_ATTRS, [] ], ['frameset', false, false, false, false, false, :frameset, false, 'window subdivision', FRAMESET_CONTENTS, 'noframes', [], FRAMESET_ATTRS, [] ], ['htrue', false, false, false, false, false, :any, false, 'heading ', HTML_INLINE, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['htrue', false, false, false, false, false, :any, false, 'heading ', HTML_INLINE, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['htrue', false, false, false, false, false, :any, false, 'heading ', HTML_INLINE, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['h4', false, false, false, false, false, :any, false, 'heading ', HTML_INLINE, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['h5', false, false, false, false, false, :any, false, 'heading ', HTML_INLINE, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['h6', false, false, false, false, false, :any, false, 'heading ', HTML_INLINE, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['head', true, true, false, false, false, :any, false, 'document head ', HEAD_CONTENTS, nil, HEAD_ATTRS, [], [] ], ['hr', false, true, true, true, false, :any, false, 'horizontal rule ', EMPTY, nil, HTML_ATTRS, HR_DEPR, [] ], ['html', true, true, false, false, false, :any, false, 'document root element ', HTML_CONTENT, nil, I18N_ATTRS, VERSION_ATTR, [] ], ['i', false, true, false, false, false, :any, true, 'italic text style', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['iframe', false, false, false, false, false, :any, true, 'inline subwindow ', HTML_FLOW, nil, [], IFRAME_ATTRS, [] ], ['img', false, true, true, true, false, :any, true, 'embedded image ', EMPTY, nil, IMG_ATTRS, ALIGN_ATTR, SRC_ALT_ATTRS ], ['input', false, true, true, true, false, :any, true, 'form control ', EMPTY, nil, INPUT_ATTRS, ALIGN_ATTR, [] ], ['ins', false, false, false, false, false, :any, true, 'inserted text', HTML_FLOW, nil, EDIT_ATTRS, [], [] ], ['isindex', false, true, true, true, true, :loose, false, 'single line prompt ', EMPTY, nil, [], PROMPT_ATTRS, [] ], ['kbd', false, false, false, false, false, :any, true, 'text to be entered by the user', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['label', false, false, false, false, false, :any, true, 'form field label text ', [HTML_INLINE, MODIFIER], nil, LABEL_ATTRS, [], [] ], ['legend', false, false, false, false, false, :any, false, 'fieldset legend ', HTML_INLINE, nil, LEGEND_ATTRS, ALIGN_ATTR, [] ], ['li', false, true, true, false, false, :any, false, 'list item ', HTML_FLOW, nil, HTML_ATTRS, [], [] ], ['link', false, true, true, true, false, :any, false, 'a media-independent link ', EMPTY, nil, LINK_ATTRS, TARGET_ATTR, [] ], ['map', false, false, false, false, false, :any, true, 'client-side image map ', MAP_CONTENTS, nil, HTML_ATTRS, [], NAME_ATTR ], ['menu', false, false, false, false, true, :loose, false, 'menu list ', BLOCKLI_ELT, nil, [], COMPACT_ATTRS, [] ], ['meta', false, true, true, true, false, :any, false, 'generic metainformation ', EMPTY, nil, META_ATTRS, [], CONTENT_ATTR ], ['noframes', false, false, false, false, false, :frameset, false, 'alternate content container for non frame-based rendering ', NOFRAMES_CONTENT, 'body', HTML_ATTRS, [], [] ], ['noscript', false, false, false, false, false, :any, false, 'alternate content container for non script-based rendering ', HTML_FLOW, 'div', HTML_ATTRS, [], [] ], ['object', false, false, false, false, false, :any, true, 'generic embedded object ', OBJECT_CONTENTS, 'div', OBJECT_ATTRS, OBJECT_DEPR, [] ], ['ol', false, false, false, false, false, :any, false, 'ordered list ', LI_ELT, 'li', HTML_ATTRS, OL_ATTRS, [] ], ['optgroup', false, false, false, false, false, :any, false, 'option group ', OPTION_ELT, 'option', OPTGROUP_ATTRS, [], LABEL_ATTR ], ['option', false, true, false, false, false, :any, false, 'selectable choice ', HTML_PCDATA, nil, OPTION_ATTRS, [], [] ], ['p', false, true, false, false, false, :any, false, 'paragraph ', HTML_INLINE, nil, HTML_ATTRS, ALIGN_ATTR, [] ], ['param', false, true, true, true, false, :any, false, 'named property value ', EMPTY, nil, PARAM_ATTRS, [], NAME_ATTR ], ['pre', false, false, false, false, false, :any, false, 'preformatted text ', PRE_CONTENT, nil, HTML_ATTRS, WIDTH_ATTR, [] ], ['q', false, false, false, false, false, :any, true, 'short inline quotation ', HTML_INLINE, nil, QUOTE_ATTRS, [], [] ], ['s', false, true, false, false, true, :loose, true, 'strike-through text style', HTML_INLINE, nil, [], HTML_ATTRS, [] ], ['samp', false, false, false, false, false, :any, true, 'sample program output, scripts, etc.', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['script', false, false, false, false, false, :any, true, 'script statements ', HTML_CDATA, nil, SCRIPT_ATTRS, LANGUAGE_ATTR, TYPE_ATTR ], ['select', false, false, false, false, false, :any, true, 'option selector ', SELECT_CONTENT, nil, SELECT_ATTRS, [], [] ], ['small', false, true, false, false, false, :any, true, 'small text style', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['span', false, false, false, false, false, :any, true, 'generic language/style container ', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['strike', false, true, false, false, true, :loose, true, 'strike-through text', HTML_INLINE, nil, [], HTML_ATTRS, [] ], ['strong', false, true, false, false, false, :any, true, 'strong emphasis', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['style', false, false, false, false, false, :any, false, 'style info ', HTML_CDATA, nil, STYLE_ATTRS, [], TYPE_ATTR ], ['sub', false, true, false, false, false, :any, true, 'subscript', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['sup', false, true, false, false, false, :any, true, 'superscript ', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['table', false, false, false, false, false, :any, false, '', TABLE_CONTENTS, 'tr', TABLE_ATTRS, TABLE_DEPR, [] ], ['tbody', true, false, false, false, false, :any, false, 'table body ', TR_ELT, 'tr', TALIGN_ATTRS, [], [] ], ['td', false, false, false, false, false, :any, false, 'table data cell', HTML_FLOW, nil, TH_TD_ATTR, TH_TD_DEPR, [] ], ['textarea', false, false, false, false, false, :any, true, 'multi-line text field ', HTML_PCDATA, nil, TEXTAREA_ATTRS, [], ROWS_COLS_ATTR ], ['tfoot', false, true, false, false, false, :any, false, 'table footer ', TR_ELT, 'tr', TALIGN_ATTRS, [], [] ], ['th', false, true, false, false, false, :any, false, 'table header cell', HTML_FLOW, nil, TH_TD_ATTR, TH_TD_DEPR, [] ], ['thead', false, true, false, false, false, :any, false, 'table header ', TR_ELT, 'tr', TALIGN_ATTRS, [], [] ], ['title', false, false, false, false, false, :any, false, 'document title ', HTML_PCDATA, nil, I18N_ATTRS, [], [] ], ['tr', false, false, false, false, false, :any, false, 'table row ', TR_CONTENTS, 'td', TALIGN_ATTRS, BGCOLOR_ATTR, [] ], ['tt', false, true, false, false, false, :any, true, 'teletype or monospaced text style', HTML_INLINE, nil, HTML_ATTRS, [], [] ], ['u', false, true, false, false, true, :loose, true, 'underlined text style', HTML_INLINE, nil, [], HTML_ATTRS, [] ], ['ul', false, false, false, false, false, :any, false, 'unordered list ', LI_ELT, 'li', HTML_ATTRS, UL_DEPR, [] ], ['var', false, false, false, false, false, :any, true, 'instance of a variable or program argument', HTML_INLINE, nil, HTML_ATTRS, [], [] ] ].each do |descriptor| name = descriptor[0] begin d = Desc.new(*descriptor) # flatten all the attribute lists (Ruby1.9, *[a,b,c] can be # used to flatten a literal list, but not in Ruby1.8). d[:subelts] = d[:subelts].flatten d[:attrs_opt] = d[:attrs_opt].flatten d[:attrs_depr] = d[:attrs_depr].flatten d[:attrs_req] = d[:attrs_req].flatten rescue => e p name raise e end DefaultDescriptions[name] = d end end end end nokogiri-1.6.1/lib/nokogiri/html/document_fragment.rb0000644000175000017500000000234312261213762022303 0ustar boutilboutilmodule Nokogiri module HTML class DocumentFragment < Nokogiri::XML::DocumentFragment attr_accessor :errors #### # Create a Nokogiri::XML::DocumentFragment from +tags+, using +encoding+ def self.parse tags, encoding = nil doc = HTML::Document.new encoding ||= tags.respond_to?(:encoding) ? tags.encoding.name : 'UTF-8' doc.encoding = encoding new(doc, tags) end def initialize document, tags = nil, ctx = nil return self unless tags if ctx preexisting_errors = document.errors.dup node_set = ctx.parse("
      #{tags}
      ") node_set.first.children.each { |child| child.parent = self } unless node_set.empty? self.errors = document.errors - preexisting_errors else # This is a horrible hack, but I don't care if tags.strip =~ /^#{tags}", nil, document.encoding temp_doc.xpath(path).each { |child| child.parent = self } self.errors = temp_doc.errors end children end end end end nokogiri-1.6.1/lib/nokogiri/version.rb0000644000175000017500000000550312261213762017324 0ustar boutilboutilmodule Nokogiri # The version of Nokogiri you are using VERSION = '1.6.1' class VersionInfo # :nodoc: def jruby? ::JRUBY_VERSION if RUBY_PLATFORM == "java" end def engine defined?(RUBY_ENGINE) ? RUBY_ENGINE : 'mri' end def loaded_parser_version LIBXML_PARSER_VERSION.scan(/^(\d+)(\d\d)(\d\d)(?!\d)/).first.collect{ |j| j.to_i }.join(".") end def compiled_parser_version LIBXML_VERSION end def libxml2? defined?(LIBXML_VERSION) end def libxml2_using_system? ! libxml2_using_packaged? end def libxml2_using_packaged? NOKOGIRI_USE_PACKAGED_LIBRARIES end def warnings return [] unless libxml2? if compiled_parser_version != loaded_parser_version ["Nokogiri was built against LibXML version #{compiled_parser_version}, but has dynamically loaded #{loaded_parser_version}"] else [] end end def to_hash hash_info = {} hash_info['warnings'] = [] hash_info['nokogiri'] = Nokogiri::VERSION hash_info['ruby'] = {} hash_info['ruby']['version'] = ::RUBY_VERSION hash_info['ruby']['platform'] = ::RUBY_PLATFORM hash_info['ruby']['description'] = ::RUBY_DESCRIPTION hash_info['ruby']['engine'] = engine hash_info['ruby']['jruby'] = jruby? if jruby? if libxml2? hash_info['libxml'] = {} hash_info['libxml']['binding'] = 'extension' if libxml2_using_packaged? hash_info['libxml']['source'] = "packaged" hash_info['libxml']['libxml2_path'] = NOKOGIRI_LIBXML2_PATH hash_info['libxml']['libxslt_path'] = NOKOGIRI_LIBXSLT_PATH else hash_info['libxml']['source'] = "system" end hash_info['libxml']['compiled'] = compiled_parser_version hash_info['libxml']['loaded'] = loaded_parser_version hash_info['warnings'] = warnings elsif jruby? hash_info['xerces'] = Nokogiri::XERCES_VERSION hash_info['nekohtml'] = Nokogiri::NEKO_VERSION end hash_info end def to_markdown begin require 'psych' rescue LoadError end require 'yaml' "# Nokogiri (#{Nokogiri::VERSION})\n" + YAML.dump(to_hash).each_line.map { |line| " #{line}" }.join end # FIXME: maybe switch to singleton? @@instance = new @@instance.warnings.each do |warning| warn "WARNING: #{warning}" end def self.instance; @@instance; end end # More complete version information about libxml VERSION_INFO = VersionInfo.instance.to_hash def self.uses_libxml? # :nodoc: VersionInfo.instance.libxml2? end def self.jruby? # :nodoc: VersionInfo.instance.jruby? end end nokogiri-1.6.1/lib/nokogiri/css.rb0000644000175000017500000000107112261213762016423 0ustar boutilboutilrequire 'nokogiri/css/node' require 'nokogiri/css/xpath_visitor' x = $-w $-w = false require 'nokogiri/css/parser' $-w = x require 'nokogiri/css/tokenizer' require 'nokogiri/css/syntax_error' module Nokogiri module CSS class << self ### # Parse this CSS selector in +selector+. Returns an AST. def parse selector Parser.new.parse selector end ### # Get the XPath for +selector+. def xpath_for selector, options={} Parser.new(options[:ns] || {}).xpath_for selector, options end end end end nokogiri-1.6.1/CHANGELOG.ja.rdoc0000644000175000017500000011623512261213762015461 0ustar boutilboutil=== 1.6.1 / 2013年12月14日 * Bugfixes * (JRuby) Fix out of memory bug when certain invalid documents are parsed. * (JRuby) Fix regression of billion-laughs vulnerability. #586 === 1.6.0 / 2013年6月8日 This release was based on v1.5.10 and 1.6.0.rc1, and contains changes mentioned in both. * 廃止 * Remove pre 1.9 monitoring from Travis. === 1.6.0.rc1 / 2013年4月14日 This release was based on v1.5.9, and so does not contain any fixes mentioned in the notes for v1.5.10. * 註 * 実行時依存 gem として mini_portile を追加 * Ruby 1.9.2以上のみサポート * 機能 * (MRI) libxml 2.8.0 と libxslt 1.2.26 を同梱。 環境変数 NOKOGIRI_USE_SYSTEM_LIBRARIES を設定しない限り、 gem のインストール時にコンパイルして使われる。 VERSION_INFO (および `nokogiri -v`) には、同梱の libxml またはシステムの libxml のいずれが使われているかの情報が含まれる。 * (Windows) libxml 2.8.0 に更新 * 廃止 * Ruby 1.8.7以下のサポートを終了 === 1.5.11 / 2013-11-09 * Bugfixes * (JRuby) Fix out of memory bug when certain invalid documents are parsed. * (JRuby) Fix regression of billion-laughs vulnerability. #568 === 1.5.10 / 2013年6月7日 * バグ修正 * (JRuby) JRuby 1.7.3で空のIOをパースする際の "null document" エラーを修正 #883 * (JRuby) XSDにDTDのDOCTYPEがあったときのスキーマ検証の問題を修正 #912 (Patrick Chengに感謝!) * (MRI) HTMLノードに対するdefault_sub_element呼出でsegfaultしていたのを修正 #917 * 註 * RARRAY_PTR()の代わりにrb_ary_entry()を使うように変更 (そう、Rubiniusのためにね) #877 (Dirkjan Bussinkに感謝!) * テストでTypeErrorが起きるのを修正 #900 (Cédric Boutillierに感謝!) === 1.5.9 / 2013年3月21日 * バグ修正 * prefixed attributes を持つ要素が親を付け替えられたとき名前空間を適切に扱う #869 * ネストされた HTML のなかの SVG 要素が持つ名前空間つき属性を参照したときの返り値が一致しないバグを修正 #861 * (MRI) 部分ノードをパースしたときメモリリークするバグを修正 #856 === 1.5.8 / 2013年3月19日 * バグ修正 * (JRuby) xlink:href 属性があり base_uri が無いとき、 EmptyStackException 例外が発生するのを修正 #534, #805. (ありがとう, Patrick Quinn と Brian Hoffman!) * 1.5.7 から xmlns 属性が重複するバグを修正 #865 * Nokogiri::XML::Builder を使って prefixed 名前空間をルートノードに対しても使えるようにした。 #868 === 1.5.7 / 2013年3月18日 * 機能 * Ruby 2.0 で Windows 環境をサポート * バグ修正 * エンコーディング名が小文字のとき SAX::Parser.parse_io が例外を投げるようになった。 #828 * (JRuby) Java の Nokogiri はついに 1.8 と 1.9 両方のモードで全てのテストがグリーンになった!イェーイ! #798, #705 * (JRuby) Nokogiri::XML::Reader が jruby で壊れていたのを修正 (pull parser なのに全ての xml ドキュメントをメモリ上にロードしていた) #831 * (JRuby) JRuby が "&" をパースすると停止してしまう #837 * (JRuby) JRuby で不正な XML 命令をパースさせると NullPointerException 例外が発生する #838 * (JRuby) Node#content= が JRuby と MRI で一致しない #839 * (JRuby) to_xhtml が自分自身に閉じタグを表すスラッシュを含む要素を正しく表示しない #834 * (JRuby) テキストノードの後ろに続くエンティティが壊れてしまう (`&` や `;` が消える) #835 * (JRuby) 空の attributes を参照すると nil が返る #818 * ".foo" のような CSS クラス名の問い合わせ時に連続した空白を単一とみなす #854 * 名前空間の扱いが MRI と JRuby の間で統一された #846, #801 (ありがとう, Michael Klein!) * (MRI) SAX パーサーが空の xml 命令をパースできるようにする #845 === 1.5.6 / 2012年12月19日 * 新機能 * XML::Document#collect_namespaces メソッドのパフォーマンスを改善した。 #761 (ありがとう、Juergen Mangler!) * SAX::Document#processing_instructionに新しいcallbackが追加 (ありがとう、Kitaiti Makoto!) * Node#native_content= メソッドでエスケープされていない文字列をセットできるようにした。 #768 * 名前空間を付けて xpath 式を書く場合に、シンボルキーを使えるようにした。#729 (ありがとう、Ben Langfeld.) * XML::Node#[]= メソッド内で受け取った引数を文字列に変換するようにした。#729 (ありがとう、Ben Langfeld.) * bin/nokogiri コマンドが $stdin からドキュメントを読んで処理できるようにした。 * bin/nokogiri -e を指定することでコマンドラインプログラムを実行できるようにした。 * (JRuby) bin/nokogiri --version は Xerces および NekoHTML のバージョンを表示。 * バグ修正 * Nokogiri はこのバージョンからXSLT変換のエラーを検出するようになった。#731 (ありがとう、Justin Fitzsimmons!) * DocumentFragment のトップレベルノードを置き換えようとした際に Error を出さない。 #775 * SAXパーザに不正なエンコーディングに渡された場合はArgumentErrorを投げるようにした。#756 (ありがとう、Bradley Schaefer!) * (JRuby) XML宣言の前にスペースがあると、ドキュメントのパーズに失敗する。(#748の修正でこれもなおっている) #790 * (JRuby) Nokogiri::XML::Node#content のJRubyの振る舞いがCRubyと同じではない。#794, #797 * (JRuby) で '#' で始まる文字列を名前とする EntityReference を作ろうとすると INVALID_CHARACTER_ERR という例外が発生する。 #719 * (JRuby) では Nodeのサブクラスのnamespaceを正しく文字列に変換しない。 #715 * (JRuby) Node#contentがこのバージョンから改行コードを正しく表示するようになった。#737 (ありがとう、Piotr Szmielew!) * (JRuby) recover optionが指定されている場合は宣言の無い名前空間を無視するようにした。#748 * (JRuby) 名前空間を検出するXPathが続けて実行されても例外を投げてはいけない。#764 * (JRuby) XMLを表示(出力)する際のホワイトスペースの扱いをlibxml2バージョンとさらに同様になるようにした。#771 * (JRuby) 名前空間付きの属性を含むXMLドキュメントを文字列でbuilderに追加しようとすると失敗する。#770 * (JRuby) Nokogiri::XML::Document#wrapを使って生成したドキュメントに << でノードを追加しようとすると undefined method `length' for nil:NilClassのエラーが発生する #781 * (JRuby) 開いているファイルのデスクリプタを閉じようとすると、"bad file descriptor" が発生する。#495 * (JRuby) 属性デコレータに関するJRubyとCRubyの非互換性 #785 * (JRuby) DOCTYPE宣言内に内部サブセットを持たない(正しい)XMLをパースする際の問題 #547, #811 * (JRuby) テキストにコロンを含むノードをパースする際の問題 #728 * (JRuby) HTML文書のDOCTYPEを正しくパース #733 * (JRuby) Builder で create_internal_subset を使った場合のXML出力にDOCTYPE宣言を含める #751 * (JRuby) JRubyでのみ、 Builder でUTF-8テキストを #text で括る必要があった #784 === 1.5.5 / 2012年6月24日 * 機能 * JRuby の1.9モードのサポートを大幅改善!イェイ! * バグ修正 * JRuby Nokogiri の add_previous_sibling が以前は動いていたのに今は動かない(1.5.0 -> 1.5.1)。 #691 (ありがとう, John Shahid!) * JRuby バーションは URL が引数にあたえられると HTML ドキュメントを作れない。 #674 (ありがとう, John Shahid!) * JRuby バージョンは HTMLとして nil か "" が与えられると NullPointerException を投げる。 #699 * JRuby 1.9 モードでエラー, uncaught throw 'encoding_found', が発生する。 #673 * JRuby で US-ASCII にエンコードされた文字列が正しくないエンコードを返してくる。 #583 * 512 文字以上が与えられたときに XmlSaxPushParser が IndexOutOfBoundsException を投げる。#567, #615 * Xpath を評価した結果、空の NodeSet が帰ってくる場合に、NodeSet が持っている Document の decorate に失敗して例外が投げられる。#514 * JRuby で xpath を namespace 付きで指定した場合に、エラーが発生する。pull request #681 (ありがとう, Piotr Szmielew) * JRuby で Nokogiri::XML::Node を継承したクラスを定義すると、namespace が表示されない。 #695 * JRuby で RDF::RDFXML::Writer をインスタンス化しようとすると NAMESPACE_ERR (org.w3c.dom.DOMException) が発生する. #683 * JRuby で xpath に namespaces を指定すると例外が発生する. #493 * JRuby の Entity 解決は C version の Nokogiri と同じ結果にならないといけない。#704, #647, #703 === 1.5.4 / 2012年6月12日 * 機能 * "nokogiri" コマンドに `--rng` オプションが与えられと、より詳しい説明を表示するようになった。 #675 (ありがとう, Dan Radez!) * `-Werror=format-security` CFLAGを使っている hardened な Debian 系 Linux でのビルドをサポート #680. * pkg-config ありのシステム上でのよりよいビルドをサポート。 #584 * 複数の iconv がイストールされているシステムでのよりよいビルドをサポート。 * バグ修正 * DocumentFragment をベースにしてコメントノードを作ったときに Segmentation fault する。 #677, #678. * at() と search() メソッドで '.' をxpathとして扱う。 #690 * (MRI, Security) XML パース時のディフォルトのオプションを nonet に変更。これにより、ディフォルトでは ドキュメントパース時にネットワーク接続を行わないようにし、XXE 脆弱性に対応した。#693 パース時にネットワークに接続して外部のドキュメントを見にいかせたい場合には、以下のように `nonoet` オプションを設定する: Nokogiri::XML::Document.parse(xml) { |config| config.nononet } ここに、自分ならではの二重否定のジョークをうめこむとなおよし。 === 1.5.3 / 2012年6月1日 * 機能 * jQuery のような "prefixless" CSS セレクタ, ~ や >, + をサポート。#621, #623. (ありがとう, David Lee!) * homebrew 0.9でのインストールを改善してみる。(iconv周り) パッケージ管理って便利じゃない? * バグ修正 * カスタム xpath 関数が空の nodeset を引数に含む場合、segfault を起こす。 #634. * Nokogiri::XML::Node#css は、デフォルトの名前空間を持つXML文書に対して名前空間なしの属性セレクタをルールに含めても動作するようになった。 * Marshalにおいて、XSLTのカスタムXPath関数への引数の渡し方(および戻し方)に関するバグを修正 #640. * Nokogiri::XML::Reader#outer_xml がJRubyで正しく動作しない #617 * Nokogiri::XML::Attribute が JRuby 上で nil namespace を返す #647 * Nokogiri::XML::Node#namespace= メソッドが JRuby 上で prefix  が無い namespace を設定できない #648 * (JRuby) 1.9 モードで rake を実行するとデッドロックを引き起こす #571 * HTML::Document#meta_encoding は誤った Content-Type (charset部)を含む文書で例外を起こさなくなった #655 * コンテキスト付きフラグメントのパースで非サポートのエンコーディングが原因でRuby 1.8.7がSEGVを起こすのを修正 #643 * (JRuby) XPathパースにおける並行実行時の問題 #682 === 1.5.2 / 2012年3月9日 古いRuby用にgemspecを修正しての再パッケージ. #631, #632. === 1.5.1 / 2012年3月9日 * 新機能 * XML::Builder#comment はコメントノードを作れるようになった. * CSSセレクター検索が名前空間付き属性に対応 #593 * Java integration 機能が追加された. このバージョンから, XML::Document.wrap と XML::Document#to_java メソッドが利用可能。 * `nokogiri` CLIユーティリティがRelaxNGバリデーションに対応 #591 (thanks, Dan Radez!) * バグの修正 * エンコーディング自動認識において発生しうるメモリリークを修正 Tim Elliottに感謝! * homebrew がインストールされていたら、extconf は homebrew のパスを読む。 * Java版の一貫性のない挙動 #620 * JRuby (1.6.4/5) で Nokogiri::XML::Node を継承できなかった #560 * XML::Attr ノードは子ノードとして追加できないので例外を出す #558 * Node#add_next_sibling と Node#add_previous_sibling で隣接テキストノードをdupする条件を緩和 #595 * Java版の一貫性のない挙動: 空の属性値をnilとして返していた #589 * to_xhtml が要素が空のときに

      と誤ったタグを生成していた #557 * Document#add_child が Node, NodeSet, DocumentFragment および String を受け付けるようになった #546 * Document#create_element が("SOAP-ENV"のように)非単語構成文字を含む名前空間を認識するようになった. これは主に Builder を使う際に効いてくる. Builder はほぼすべてのものに Document#create_element を適用するためである #531 * ファイルエンコーディングが効かなかった (1.5.0 / jruby / windows) #529 * Java版において、タグに含まれる名前空間定義を属性として返さなかった #542 * Nokogiri 1.5.0で Bad file descriptor が発生していた #495 * remove_namespace! がpure Java版で動かなかった #492 * Javaネイティブ版でパースされたオブジェクトに対して ActiveSupport の .blank? メソッドを呼ぶと null pointer exception が発生していた #489 * 1.5.0 で正しい文字エンコーディングが仕様されなかった #488 * XML Builder に生のXML文字列を渡した際の問題 (JRuby) #486 * Nokogiri 1.5.0でXML生成が壊れていた (JRuby) #484 * ルートノードを複数持つことを認めない #550 * カスタムXPath関数を修正 #606 (Juan Wajnermanに感謝!) * Node#to_xml で :save_with が指定されている場合は上書きしないように修正 #505 * Node#set をプライベートメソッドに (JRuby) #564 (Nick Siegerに感謝!) * C14nの整理と Node#canonicalize (Ivan Pirlikに感謝!) #563 === 1.5.0 / 2011年7月1日 * 註 * 1.4.7からの変更点を参照 * 新機能 * 各文書形式用のデフォルトのNode::SaveOptionsの組合せを定数化. (Node::SaveOptions::DEFAULT_{X,H,XH}TML) * バグの修正 * JRuby版ではホワイトスペースの扱いに難があるため、XML出力(to_xml)において 自動整形をデフォルトでは行わないように変更. #415 * JRuby版でNodeのないNodeSetでNullPointerExceptionが発生するのを修正. #443 * エンコーディング宣言のないHTMLファイルで部分的に重複したドキュメントが生成される問題を修正した. #478 * を認識するようになった. === 1.5.0 beta3 2010年12月2日 * 註 * JRubyでの性能改善 * 1.4.4からの変更点を参照 * バグの修正 * Node#inner_textはnilを返さなくなった. (JRuby) #264 === 1.5.0 beta2 2010年7月30日 * 註 * 1.4.3からの変更点を参照 === 1.5.0 beta1 2010年5月22日 * 註 * 新しいピュアJavaバックエンドによりJRubyサポートを追加 * 廃止 * Ruby 1.8.6は非推奨となった. インストールできるかもしれないが、正式なサポートは終了. * LibXML 2.6.16および古いバージョンは非推奨. インストールできない. * FFIサポートは削除された. === 1.4.7 / 2011年7月1日 * バグの修正 * エンコーディング宣言のないHTMLファイルで部分的に重複したドキュメントが生成される問題を修正した. #478 === 1.4.6 / 2011年6月19日 * ノート * このバージョンは、1.4.5と機能的に同じです * Rubyの1.8.6のサポートが復元されている === 1.4.5 / 2011年5月19日 * 新機能 * Nokogiri::HTML::Document#title アクセサメソッドでHTML文書のタイトルを読み書きできる * バグの修正 * Node#serialize とその仲間達はSaveOptionオブジェクトを受け入れる * Nokogiri::CSS::Parser has-a Nokogiri::CSS::Tokenizer * (JRUBY+FFIのみ) 「弱い参照」はスレッドセーフになった. #355 * HTML::SAX::Parserから呼ばれるstart_element()コールバックのattributes引数はHTML::XML::Parserによるエミュレートコールバックと同じく連想配列になった. rel. #356 * HTML::SAX::Parserのparse*()メソッドはXML::SAX::Parser同様に渡されたブロックをコールバックするようになった. * HTMLパーサーのエンコーディング判定をlibxml2の仕様を超えて拡張・改善した. (XML宣言のencodingを認識、非ASCII文字出現後のmetaタグも文字化けを生じずに反映) * Document#remove_namespaces! は名前空間付きの属性に対応した. #396 === 1.4.4 2010年11月15日 * 新機能 * XML::Node#children=ノード内のhtml reparented node(s)を返す事によって親の変更ができる。 * XSLT はfunction extensionsをサポート。#336 * XPath はパラメーター置換を結合する. #329 * XML::Reader node typeを一定化させる. #369 * SAX Parser context は行とコラムの両方の情報を提供する * バグの修正 * XML::DTD#attributes は属性が存在しない際、nilの代わりに空のハッシュを返す * XML::DTD#{keys,each} は文字通りに機能するようになった #324 * {XML,HTML}::DocumentFragment.{new,parse} 行送りと末尾の空白を除去しなくなった #319 * XML::Node#{add_child,add_previous_sibling,add_next_sibling,replace} は文字列を見送る際にNodeSetを返す * 不確定タグはフレグメント内で要、不要に関係なく解析される #315 * XML::Node#{replace,add_previous_sibling,add_next_sibling} libxmlのtext node merging に関わるedge caseを修正する #308 * xpath handler argument が整列している最中に起こるGCでのsegfaultを修正 #345 * Slop decoratorが既に確定された定義と共に正常に機能させるための便宜上の処置 #330 * 子ノードが複製される際に起こるメモリ漏れの修正 #353 * an+b記号の無使用時に発生するoff-by-oneバグとnth-last-{child,of-type} CSSセレクターの修正 #354 * 非名前空間属性がSAX::Document#start_elementへパスできるように修正 #356 * libxml2 in-contextの解析バグの処置  #362 * フレグメント内のノードの中にあるNodeSet#wrapの修正 #331 === 1.4.3 2010年7月28日 * 新しい機能 * XML::Reader#empty_element? - 子の無いエレメントにtrueを返す  #262 * Node#remove_namespaces! - 1.4.2では 名前空間のみを取り除いていたが、 1.4.3 では名前空間及び、名前空間宣言も取り除く #294 * NodeSet#{at_xpath,at_css,>} はNodeの同名メソッドと同様の動作 * バグの修正 * XML::NodeSet#{include?,delete,push} はXML::Namespaceを受入れる * XML::Document#parse - 1.4.3より文書内の文脈を解析する機能を追加 * XML::DocumentFragment#inner_html= 文脈解析を共に実行する #298, #281 * lib/nokogiri/css/parser.y はCSSと疑似選別の両方を機能 * 演算によって近隣に存在する併合型ノードへの遊離問題の有無に関わらず、一切の 弊害なしにテキストノードの繰り返しが実行可能  #283 * xmlFirstElementChild et al.による libxml2バージョンでの不適合性を修正 #303 * XML::Attr#add_namespace (!)文字通りの機能実現!  #252 * HTML::DocumentFragment が文字列に存在するエンコードを使用 #305 * CSS3の間接セレクタ"E ~ F G"がXPathの"//F//G[preceding-sibling::E]"へと 誤変換されてしまうのを修正 === 1.4.2 2010年5月22日 * 新機能 * XML::Node#parse 定義されたコンテキストノードでXML 又はHTMLのフレグメント を解析する * XML::Node#namespacesが子ノードとその祖先ノード内で定義された全ての名前空間 を返すようになった(以前は祖先ノードの名前空間は返されなかった) * XML::Node内にEnumerableを追加 * Nokogiri::XML::Schema#validate 与えられたファイル名が引き渡された時、 Nokogiri::XML::Schema#validateはxmlSchemaValidateFileを使用する (時間短縮化とメモリーの能率化の理由を基にファイル名での引き渡しメソッドを 採用) GH #219 * XML::Document#create_entnty は新規のEntityDecl のオブジェクトを生成する GH #174 * JRuby FFI implementationでは、従来まで使用されたObjectSpace._id2refの代わり にCharles Nutterのrocking Weakling gemを使用に変更 * Nokogiri::XML::Node#first_element_child は一番最初のELEMENT子ノードを返す * Nokogiri::XML::Node#last_element_child は最後のELEMENT子ノードを返す * Nokogiri::XML::Node#elements は全てのELEMENT子ノードを返す * Nokogiri::XML::Node#add_child, #add_previous_sibling, #before, #add_next_sibling, #after, #inner_html, #swap, #replaceはNode, DocumentFragment, NodeSetおよびマークアップ文字列を受け付ける * Node#fragment? はノードがDocumentFragmentかどうかを示す * バグの修正 * ドキュメント内にデコレータがある場合、XML::NodeSet は常にデコレータされる GH #198 * XML::NodeSet#slice がノードセットよりも長いoffset+lengthを問題なく処理する GH #200 * XML::Node#content=はノードとその直前に記述されている内容を支障なく切り離す GH #203 * XML::Node#namespace= はnilを一つのパラメーターと扱って取得する * XML::Node#xpath はNodeSetのオブジェクト以外のオブジェクトを返す GH #208 * XSLT::StyleSheet#transformはパラメーターのハッシュを受け入れる GH #223 * CSSのnot()の疑似セレクタの修正  GH #205 * XML::Builder はノードらが切り離されても破壊しない(vihaiの協力に感謝) GH #228 * SAX parser経由でエンコードを強制することが出来る  Eugene Pimenovに感謝! GH #204 * XML::DocumentFragment はML::Node#parse を使用して子を限定する * XML Reader内のメモリリーク修正  sdorさん、ありがとう! GH#244 * Node#replaceはRDocの通り新しい子ノードを返す(selfを返していた) * ノート * 今日4月18日現在、Windows gems は libxml 2.7.7 とlibxslt 1.1.26にDLLsを標準装備しています。このリリース以前にも既にDLLsはlibxml 2.7.3 と libxslt 1.1.24に標準装備済み。 === 1.4.1 2009年12月10日 * 新しい機能 * Nokogiri::LIBXML_ICONV_ENABLED を追加 * Node#attr は Node#[] のエイリアス定義に変更 * XML::Node#next_element を追加 * 直接の子ノードを検索するための Node#> を追加 * XML::NodeSet#reverse を追加 * 以下のfragment supportを追加   Node#add_child   Node#add_next_sibling Node#add_previous_sibling   Node#replace * XML::Node#previous_element を追加 * nokogiriがRubinius でサポートされるようになった * CSS selector の :has() が使用可能になった * XML::NodeSet#filter() を追加 * XML::Node.next= は add_next_sibling の alias へ変更 * XML::Node.previous= は add_previous_sibling の alias へ変更 * バグの修正 * XMLのフラグメントに名前空間が存在する場合のみ、DocumentFragmentを作る際に、 例外が投げられなくなった * DocumentFragment内で子ノードが存在する場合、 Node#matches?が機能するようになった GH #158 * Documentは add_namespace()を限定すべきではないので削除GH #169 * XPath クエリは名前空間の宣言を変換するがsegvではない。 * Node#replace は他のドキュメントのノードが使えるようになった * XML::Document#collect_namespaces を追加 * SOAP4R のアダプター内のバグ修正 * XML::Node#next_element 内のバグ修正 * WindowsでのJRuby の LOAD_PATH を修正 GH #160 * XSLT#apply_toは "output method"の値を使用する(richardlehaneに感謝) * 新しい文字列の先頭にくるテキストノードを含んだフレグメントが 正確に 解析出来るようになった GH #178 === 1.4.0 2009年10月30日 * 今日はノコギリの満一歳のお誕生日です * 新しい機能 * Node#at_xpath はXPath式に適合するNodeSetの一番最初の要素を返す * Node#at_css はCSSセレクターに適合するNodeSetの一番最初の要素を返す * NodeSet#| はNodeSet同士を合成する GH #119 (Serabe ありがとう!) * NodeSet#inspect の出力をより美しくした * Node#inspect の出力をよりrubyらしくした * XML::DTD#external_id を追加 * XML::DTD#system_id を追加 * XML::ElementContent はDTD要素のコンテンツを有効化する * Nokogiri::XML::Builder内での名前空間宣言用のサポートを改良 * XML::Node#external_subsetを追加 * XML::Node#create_external_subsetを追加 * XML::Node#create_internal_subsetを追加 * XML Builderは生成されていないstringsを付加出来る様になった (GH #141, patch from dudleyf) * XML::SAX::ParserContext を追加 * XML::Document#remove_namespaces! は名前空間を使いこなせない人たち用の措置 * バグの修正 * HTMLドキュメントが メタエンコーディングのタグを宣言しない時、 nilを返すようになった GH #115 * ENV['PATH'] を調節する為に、RbConfig::CONFIG['host_os']を使用できるように なった GH #113 * NodeSet#searchが更に効率的になった GH #119 (Serabe!に感謝します) * NodeSet#xpathがcustom xpath機能を取り扱える様になった * XML::Reader が現時点のノード用に属性を取得する際に、 SEGVを修正するようになった * Node#inner_html がNode#to_html と同じ独立変数を受け入れるようになった GH #117 * DocumentFragment#css は子ノードへ委任するようになった GH #123 * NodeSet#[]がNodeSet#lengthより大きいスライスでも機能するようになった GH #131 * 新たな親ノードの名前空間を維持出来るようになった GH #134 * XML::Document をNodeSetに追加する際のSEGVが修正された * XML::SyntaxError が複製可能になった * 廃棄予定 * Hpricot用の互換性レイヤーを削除 === 1.3.3 / 2009年7月26日 * 新しい機能 * NodeSet#children 全ての子ノードを返すようになった * バグの修正 * libxml-ruby のグローバ ルエラー ハンドラー に優先するようになった * ParseOption#strict を修正 * 空文字列を Node#inner_html= に与えた時に生じたSEGVを修正 GH #88 * Ruby 1.9 では文字列のエンコーディングをUTF-8になるようにした * ドキュメントの根ノードから違うドキュメントの根ノードに移動した時に生じた SEGVを修正 GH #91 * ノードをインスタンス化する時のO(n)のペナルティーを修正 GH #101 * XMLのドキュメントをHTMLのドキュメントとして出力出来るようになった * 廃棄予定 * Hpricotの互換性レイヤーがNokogiriの1.4.0で除去される予定 === 1.3.2 / 2009年6月22日 * 新しい機能 * Nokogiri::XML::DTD#validate はドキュメントを検証できるようになった * バグの修正 * Nokogiri::XML::NodeSet#search はトップレベルのノードを検索するようになった GH #73 * Nokogiri::XML::Documentからメソッドに関係する名前空間を取り除いた * 2回同じ名前空間が追加されたときSEGVする問題を修正した * Snow Leopard で Nokogiri が動くようになった GH #79 * メーリングリストはGoogle Groupsの以下のURLに移動した http://groups.google.com/group/nokogiri-talk * HTML フラグメントはコメントとCDATAを正確に扱うようになった * Nokogiri::XML::Document#cloneはdupのaliasになった * 廃棄予定 * Nokogiri::XML::SAX::Document#start_element_nsは廃棄予定なので Nokogiri::XML::SAX::Document#start_element_namespaceを代わりに使用して下さい * Nokogiri::XML::SAX::Document#end_element_nsは廃棄予定なので Nokogiri::XML::SAX::Document#end_element_namespaceを代わりに使用して下さい === 1.3.1 / 2009年6月7日 * バグの修正 * extconf.rb は任意のRelaxNGとSchemaの機能を探すようになった * ドキュメントのノードキャッシュに名前空間のノードが入るようになった === 1.3.0 / 2009年5月30日 * 新しい機能 * Builderがブロックの引数の数に応じてスコープが定まるようになった * Builderがアンダースコアで終わるメソッドをtagzと同様にサポートするようになった * Nokogiri::XML::Node#<=> がドキュメントの位置によりノードを比較するように なった * Nokogiri::XML::Node#matches?が与えられたセレクタ内でノードがあればtrue を返すようになった * Nokogiri::XML::Node#ancestors がNokogiri::XML::NodeSetオブジェクトを返すようになった * Nokogiri::XML::Node#ancestorsがオプションのセレクタに対応する親をマッチする ようになった * Nokogiri::HTML::Document#meta_encoding がメタデータのエンコードを返すように なった * Nokogiri::HTML::Document#meta_encoding= でメタデータのエンコードを 設定できるようになった * Nokogiri::XML::Document#encoding= でドキュメントのエンコードを 設定できるようになった * Nokogiri::XML::Schema でドキュメントがXSDのスキーマに沿って記述されているか を検証できるようになった * Nokogiri::XML::RelaxNG でドキュメントがRelaxNGのスキーマに沿って 記述されているかを検証できるようになった * Nokogiri::HTML::ElementDescription はHTML要素の説明フェッチ動作するよう になった * Nokogiri::XML::Node#descriptionは ノードの説明をフェッチ動作するよう になった * Nokogiri::XML::Node#accept は Visitor パターンを実行するようになった * 簡単にドキュメントを解析するコマンド bin/nokogiri を追加 (Yataka HARAさんに感謝感激) * Nokogiri::XML::NodeSetが更にArrayとEnumerableの演算を サポートするようになった: index, delete, slice, - (差分), + (連結), & (共通部分), push, pop, shift, == * Nokogiri.XML, Nokogiri.HTML はNokogiri::XML::ParseOptions objectと一緒に 呼び出されるブロックを受け入れるようになった * Nokogiri::XML::Node#namespace は Nokogiri::XML::Namespaceを返すようになった * Nokogiri::XML::Node#namespaceはノードの名前空間を設定するようになった * FFI 経由で JRuby 1.3.0 をサポートするようになった * バグの修正 * nilがCDATAsonstructorに渡される際の問題を修正 * Fragment メソッドが正規表現を抜け出させるようになった (Joelさんに感謝感激) (LH #73) * Builder スコープのLH #61, LH #74, LH #70に関しての様々な問題を修正 * 名前空間を付け加える時、名前空間が LH#78より除去されてしまう問題を修正 * 連結しないノードが発生し、再育成してしまう問題を修正(GH#22) * XSLT が解析中にエラーを発見し損なう問題を修正(GH#32) * CSS selectors内での条件属性のバグ問題を修正(GH#36) * Node#before/after/inner_html=で値なしのHTML属性が受け入れられなかった問題を 修正 (GH#35) === 1.2.3 / 2009年3月22日 * バグの修正 * Node#new 内にて、バグを修正する * DocumentFragmentの作成時、名前空間に割り当てる LH #66 * Nokogiri::XML::NodeSet#dup は機能するようになった GH #10 * Nokogiri::HTMLは文字列がブランクの時、空のドキュメントで返す GH#11 * 子ノードを付加する事で、重複した名前空間の宣言を取り除く LH#67 * ビルダ方法はハッシュを第二引数とする === 1.2.2 / 2009年3月14日 * 新しい機能 * Nokogiri は soap4r と一緒に使う事が可能。(XSD::XMLParser::Nokogiri 参照) * Nokogiri::XML::Node#inner_html= はノードの中のHTMLをセット出来る * NokogiriのBuilderのインタフェースの改良 * Nokogiri::XML::Node#swap は、現在のノードに新しいhtmlを交換する事が出来る * バグの修正 * BuilderAPIのタグのネスティングを修正 (LH #41) * Nokogiri::HTML.fragment はテキストだけのノードを適切に扱う事が出来る(LH #43) * Nokogiri::XML::Node#before はテキストノードのはじめに挿入する事が出来る (LH #44) * Nokogiri::XML::Node#after はテキストノードの文末に挿入する事が出来る * Nokogiri::XML::Node#search 名前空間が自動的に登録されるようになった(LH#42) * Nokogiri::XML::NodeSet#search 名前空間が自動的に登録されるようになった * Nokogiri::HTML::NamedCharacters はlibxml2に委任 * Nokogiri::XML::Node#[] はSymbolを使う (LH #48) * vasprintf にwindowsを修正 (Geffroy Couprie ありがとう!) * Nokogiri::XML::Node#[]= はentityを符号化しない (LH #55) * 名前空間はreparentedのノードに模写する (LH #56) * StringのエンコードはRuby 1.9での初期設定を使用する * Document#dup は新しいドキュメントに同じタイプを作る (LH #59) * Document#parent は存在しない (LH #64) === 1.2.1 / 2009年2月23日 * 修正 * CSS のセレクターのスペースを修正 * Ruby 1.9 のStringのエンコードを修正 (角谷さんに感謝!) === 1.2.0 / 2009年2月22日 * 新しい機能 * CSSサーチが CSS3 名前空間クエリをサポートするようになった * ルート要素での名前空間が自動的に登録されるようになった * CSS クエリが初期設定の名前空間を使うようになった * Nokogiri::XML::Document#encoding で文書にエンコードを使用、受け取る * Nokogiri::XML::Document#url で文書のURLを受け取る * Nokogiri::XML::Node#each はname属性、値を一組にし反復適用する * Nokogiri::XML::Node#keys はすべてのname属性を受け取る * Nokogiri::XML::Node#line は行番号をノード用に受け取る (Dirkjan Bussinkさんに感謝感激) * Nokogiri::XML::Node#serialize は任意されたencodingパラメーターを受け入れる * Nokogiri::XML::Node#to_html, to_xml, と to_xhtml は任意されたencodingパラメーターを受け入れる * Nokogiri::XML::Node#to_str * Nokogiri::XML::Node#to_xhtml でXHTML文書を生成する * Nokogiri::XML::Node#values が全ての属性値を受け入れる * Nokogiri::XML::Node#write_to は任意されたencodingで要素をIOオブジェクトへ書く * Nokogiri::XML::ProcessingInstrunction.new * Nokogiri::XML::SAX::PushParser は全てのプッシュパースに必要な解析をする * バグの修正 * Nokogiri::XML::Document#dup を修正 * ヘッダ検知を修正. 謝々るびきちさん! * 無効なCSS内にて解析機能を動かなくさせる原因を修正 * 廃棄予定 * Nokogiri::XML::Node.new_from_str は1.3.0にて廃棄予定 * APIの変更 * Nokogiri::HTML.fragment は XML::DocumentFragment (LH #32)で返す === 1.1.1 * 新しい機能 * XML::Node#elem? を追加 * XML::Node#attribute_nodes を追加 * XML::Attr を追加 * XML::Node#delete を追加 * XML::NodeSet#inner_html を追加 * バグの修正 * HTML のノードに \r のエンティティを含めない * CSS::SelectorHandler と XML::XPathHandler を除去 * XML::Node#attributes が Attr node を値として返す * XML::NodeSet が to_xml を実装 === 1.1.0 * 新しい機能 * カスタム XPath 機能を追加。( Nokogiri::XML::Node#xpath 参照 ) * カスタム CSS 擬似クラスと機能を追加。( Nokogiri::XML::Node#css 参照 ) * Nokogiri::XML::Node#<< が作成中に子ノードを自動追加 * バグの修正 * mutex が CSS のキャッシュのアクセスをロックする * GCC 3.3.5 のビルドに関する問題を修正 * XML::Node#to_xml が引数indentationを取る * XML::Node#dup が引数任意のdepthを取る * XML::Node#add_previous_sibling が新しい兄弟ノードで返す === 1.0.7 * バグの修正 * Dike 使用時中のメモリーリークの修正 * SAX パーサーが IO Stream を逐次解析 * コメント nodes が独自のクラスを継承する * Nokogiri() は Nokogiri.parse() へデリゲートする * ENV['PATH'] に付加する代わりに先頭へ挿入する (Windows) * 複雑な CSS 内のバグを修正完了 :not selector ではありません === 1.0.6 * 5つの修正 * XPath のパーサーが SyntaxError を生じさせ解析停止させる * CSS のパーサーが SyntaxError を生じさせ解析停止させる * filter() と not() hpricot の互換性を追加 * CSS が Node#search 経由で検索し、常時対応する事が出来るようになった * CSS より XPath 変換がキャッシュに入れられるようになった === 1.0.5 * バグフィックス * メーリングリストを作成 * バグファイルを作成 * Windows 内で ENV['PATH'] が存在しない場合でも、存在出来るように設定完了 * Document 内の NodeSet#[] の結果をキャッシュする === 1.0.4 * バグフィックス * 弱参照からドキュメント参照へのメモリー管理の変換 * メモリリークに接続 * Builderブロックが取り囲んでいるコンテキストから メソッドの呼び出しをする事が出来る === 1.0.3 * 5つのバグ修正 * NodeSet が to_ary を実装 * XML::Document#parent を除去 * GCバグ修正済み (Mike は最高!) * 1.8.5互換性の為の RARRAY_LEN 除去 * inner_html 修正済み (Yahuda に感謝) === 1.0.2 * 1つのバグ修正 * extconf.rb は frex や racc の存在をチェックすべきでない === 1.0.1 * 1つのバグ修正 * extconf.rb が libdir や prefix を検索しない事を確認済み それによって、ports libxml/ruby が正しくリンクする (lucsky に感謝!) === 1.0.0 / 2008年07月13日 * 1つの偉大な増進 * ご誕生である nokogiri-1.6.1/metadata.yml0000644000175000017500000003663312261213762015236 0ustar boutilboutil--- !ruby/object:Gem::Specification name: nokogiri version: !ruby/object:Gem::Version version: 1.6.1 platform: ruby authors: - Aaron Patterson - Mike Dalessio - Yoko Harada - Tim Elliott autorequire: bindir: bin cert_chain: [] date: 2013-12-14 00:00:00 Z dependencies: - !ruby/object:Gem::Dependency version_requirements: &id001 !ruby/object:Gem::Requirement requirements: - - ~> - !ruby/object:Gem::Version version: 0.5.0 type: :runtime name: mini_portile prerelease: false requirement: *id001 - !ruby/object:Gem::Dependency version_requirements: &id002 !ruby/object:Gem::Requirement requirements: - - ~> - !ruby/object:Gem::Version version: "4.0" type: :development name: rdoc prerelease: false requirement: *id002 - !ruby/object:Gem::Dependency version_requirements: &id003 !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: "1.1" type: :development name: hoe-bundler prerelease: false requirement: *id003 - !ruby/object:Gem::Dependency version_requirements: &id004 !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: 1.0.3 type: :development name: hoe-debugging prerelease: false requirement: *id004 - !ruby/object:Gem::Dependency version_requirements: &id005 !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: "1.0" type: :development name: hoe-gemspec prerelease: false requirement: *id005 - !ruby/object:Gem::Dependency version_requirements: &id006 !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: "1.4" type: :development name: hoe-git prerelease: false requirement: *id006 - !ruby/object:Gem::Dependency version_requirements: &id007 !ruby/object:Gem::Requirement requirements: - - ~> - !ruby/object:Gem::Version version: 2.2.2 type: :development name: minitest prerelease: false requirement: *id007 - !ruby/object:Gem::Dependency version_requirements: &id008 !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: "0.9" type: :development name: rake prerelease: false requirement: *id008 - !ruby/object:Gem::Dependency version_requirements: &id009 !ruby/object:Gem::Requirement requirements: - - ~> - !ruby/object:Gem::Version version: 0.8.0 type: :development name: rake-compiler prerelease: false requirement: *id009 - !ruby/object:Gem::Dependency version_requirements: &id010 !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: 1.4.6 type: :development name: racc prerelease: false requirement: *id010 - !ruby/object:Gem::Dependency version_requirements: &id011 !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: 1.0.5 type: :development name: rexical prerelease: false requirement: *id011 - !ruby/object:Gem::Dependency version_requirements: &id012 !ruby/object:Gem::Requirement requirements: - - ~> - !ruby/object:Gem::Version version: "3.7" type: :development name: hoe prerelease: false requirement: *id012 description: "Nokogiri (\xE9\x8B\xB8) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's\n\ many features is the ability to search documents via XPath or CSS3 selectors.\n\n\ XML is like violence - if it doesn\xE2\x80\x99t solve your problems, you are not using\n\ enough of it." email: - aaronp@rubyforge.org - mike.dalessio@gmail.com - yokolet@gmail.com - tle@holymonkey.com executables: - nokogiri extensions: - ext/nokogiri/extconf.rb extra_rdoc_files: - CHANGELOG.ja.rdoc - CHANGELOG.rdoc - C_CODING_STYLE.rdoc - Manifest.txt - README.ja.rdoc - README.rdoc - ROADMAP.md - STANDARD_RESPONSES.md - Y_U_NO_GEMSPEC.md - ext/nokogiri/html_document.c - ext/nokogiri/html_element_description.c - ext/nokogiri/html_entity_lookup.c - ext/nokogiri/html_sax_parser_context.c - ext/nokogiri/html_sax_push_parser.c - ext/nokogiri/nokogiri.c - ext/nokogiri/xml_attr.c - ext/nokogiri/xml_attribute_decl.c - ext/nokogiri/xml_cdata.c - ext/nokogiri/xml_comment.c - ext/nokogiri/xml_document.c - ext/nokogiri/xml_document_fragment.c - ext/nokogiri/xml_dtd.c - ext/nokogiri/xml_element_content.c - ext/nokogiri/xml_element_decl.c - ext/nokogiri/xml_encoding_handler.c - ext/nokogiri/xml_entity_decl.c - ext/nokogiri/xml_entity_reference.c - ext/nokogiri/xml_io.c - ext/nokogiri/xml_libxml2_hacks.c - ext/nokogiri/xml_namespace.c - ext/nokogiri/xml_node.c - ext/nokogiri/xml_node_set.c - ext/nokogiri/xml_processing_instruction.c - ext/nokogiri/xml_reader.c - ext/nokogiri/xml_relax_ng.c - ext/nokogiri/xml_sax_parser.c - ext/nokogiri/xml_sax_parser_context.c - ext/nokogiri/xml_sax_push_parser.c - ext/nokogiri/xml_schema.c - ext/nokogiri/xml_syntax_error.c - ext/nokogiri/xml_text.c - ext/nokogiri/xml_xpath_context.c - ext/nokogiri/xslt_stylesheet.c files: - .autotest - .gemtest - .travis.yml - CHANGELOG.ja.rdoc - CHANGELOG.rdoc - C_CODING_STYLE.rdoc - Gemfile - Manifest.txt - README.ja.rdoc - README.rdoc - ROADMAP.md - Rakefile - STANDARD_RESPONSES.md - Y_U_NO_GEMSPEC.md - bin/nokogiri - build_all - dependencies.yml - ext/nokogiri/depend - ext/nokogiri/extconf.rb - ext/nokogiri/html_document.c - ext/nokogiri/html_document.h - ext/nokogiri/html_element_description.c - ext/nokogiri/html_element_description.h - ext/nokogiri/html_entity_lookup.c - ext/nokogiri/html_entity_lookup.h - ext/nokogiri/html_sax_parser_context.c - ext/nokogiri/html_sax_parser_context.h - ext/nokogiri/html_sax_push_parser.c - ext/nokogiri/html_sax_push_parser.h - ext/nokogiri/nokogiri.c - ext/nokogiri/nokogiri.h - ext/nokogiri/xml_attr.c - ext/nokogiri/xml_attr.h - ext/nokogiri/xml_attribute_decl.c - ext/nokogiri/xml_attribute_decl.h - ext/nokogiri/xml_cdata.c - ext/nokogiri/xml_cdata.h - ext/nokogiri/xml_comment.c - ext/nokogiri/xml_comment.h - ext/nokogiri/xml_document.c - ext/nokogiri/xml_document.h - ext/nokogiri/xml_document_fragment.c - ext/nokogiri/xml_document_fragment.h - ext/nokogiri/xml_dtd.c - ext/nokogiri/xml_dtd.h - ext/nokogiri/xml_element_content.c - ext/nokogiri/xml_element_content.h - ext/nokogiri/xml_element_decl.c - ext/nokogiri/xml_element_decl.h - ext/nokogiri/xml_encoding_handler.c - ext/nokogiri/xml_encoding_handler.h - ext/nokogiri/xml_entity_decl.c - ext/nokogiri/xml_entity_decl.h - ext/nokogiri/xml_entity_reference.c - ext/nokogiri/xml_entity_reference.h - ext/nokogiri/xml_io.c - ext/nokogiri/xml_io.h - ext/nokogiri/xml_libxml2_hacks.c - ext/nokogiri/xml_libxml2_hacks.h - ext/nokogiri/xml_namespace.c - ext/nokogiri/xml_namespace.h - ext/nokogiri/xml_node.c - ext/nokogiri/xml_node.h - ext/nokogiri/xml_node_set.c - ext/nokogiri/xml_node_set.h - ext/nokogiri/xml_processing_instruction.c - ext/nokogiri/xml_processing_instruction.h - ext/nokogiri/xml_reader.c - ext/nokogiri/xml_reader.h - ext/nokogiri/xml_relax_ng.c - ext/nokogiri/xml_relax_ng.h - ext/nokogiri/xml_sax_parser.c - ext/nokogiri/xml_sax_parser.h - ext/nokogiri/xml_sax_parser_context.c - ext/nokogiri/xml_sax_parser_context.h - ext/nokogiri/xml_sax_push_parser.c - ext/nokogiri/xml_sax_push_parser.h - ext/nokogiri/xml_schema.c - ext/nokogiri/xml_schema.h - ext/nokogiri/xml_syntax_error.c - ext/nokogiri/xml_syntax_error.h - ext/nokogiri/xml_text.c - ext/nokogiri/xml_text.h - ext/nokogiri/xml_xpath_context.c - ext/nokogiri/xml_xpath_context.h - ext/nokogiri/xslt_stylesheet.c - ext/nokogiri/xslt_stylesheet.h - lib/nokogiri.rb - lib/nokogiri/css.rb - lib/nokogiri/css/node.rb - lib/nokogiri/css/parser.rb - lib/nokogiri/css/parser.y - lib/nokogiri/css/parser_extras.rb - lib/nokogiri/css/syntax_error.rb - lib/nokogiri/css/tokenizer.rb - lib/nokogiri/css/tokenizer.rex - lib/nokogiri/css/xpath_visitor.rb - lib/nokogiri/decorators/slop.rb - lib/nokogiri/html.rb - lib/nokogiri/html/builder.rb - lib/nokogiri/html/document.rb - lib/nokogiri/html/document_fragment.rb - lib/nokogiri/html/element_description.rb - lib/nokogiri/html/element_description_defaults.rb - lib/nokogiri/html/entity_lookup.rb - lib/nokogiri/html/sax/parser.rb - lib/nokogiri/html/sax/parser_context.rb - lib/nokogiri/html/sax/push_parser.rb - lib/nokogiri/syntax_error.rb - lib/nokogiri/version.rb - lib/nokogiri/xml.rb - lib/nokogiri/xml/attr.rb - lib/nokogiri/xml/attribute_decl.rb - lib/nokogiri/xml/builder.rb - lib/nokogiri/xml/cdata.rb - lib/nokogiri/xml/character_data.rb - lib/nokogiri/xml/document.rb - lib/nokogiri/xml/document_fragment.rb - lib/nokogiri/xml/dtd.rb - lib/nokogiri/xml/element_content.rb - lib/nokogiri/xml/element_decl.rb - lib/nokogiri/xml/entity_decl.rb - lib/nokogiri/xml/namespace.rb - lib/nokogiri/xml/node.rb - lib/nokogiri/xml/node/save_options.rb - lib/nokogiri/xml/node_set.rb - lib/nokogiri/xml/notation.rb - lib/nokogiri/xml/parse_options.rb - lib/nokogiri/xml/pp.rb - lib/nokogiri/xml/pp/character_data.rb - lib/nokogiri/xml/pp/node.rb - lib/nokogiri/xml/processing_instruction.rb - lib/nokogiri/xml/reader.rb - lib/nokogiri/xml/relax_ng.rb - lib/nokogiri/xml/sax.rb - lib/nokogiri/xml/sax/document.rb - lib/nokogiri/xml/sax/parser.rb - lib/nokogiri/xml/sax/parser_context.rb - lib/nokogiri/xml/sax/push_parser.rb - lib/nokogiri/xml/schema.rb - lib/nokogiri/xml/syntax_error.rb - lib/nokogiri/xml/text.rb - lib/nokogiri/xml/xpath.rb - lib/nokogiri/xml/xpath/syntax_error.rb - lib/nokogiri/xml/xpath_context.rb - lib/nokogiri/xslt.rb - lib/nokogiri/xslt/stylesheet.rb - lib/xsd/xmlparser/nokogiri.rb - tasks/cross_compile.rb - tasks/nokogiri.org.rb - tasks/test.rb - test/css/test_nthiness.rb - test/css/test_parser.rb - test/css/test_tokenizer.rb - test/css/test_xpath_visitor.rb - test/decorators/test_slop.rb - test/files/2ch.html - test/files/address_book.rlx - test/files/address_book.xml - test/files/bar/bar.xsd - test/files/bogus.xml - test/files/dont_hurt_em_why.xml - test/files/encoding.html - test/files/encoding.xhtml - test/files/exslt.xml - test/files/exslt.xslt - test/files/foo/foo.xsd - test/files/metacharset.html - test/files/noencoding.html - test/files/po.xml - test/files/po.xsd - test/files/saml/saml20assertion_schema.xsd - test/files/saml/saml20protocol_schema.xsd - test/files/saml/xenc_schema.xsd - test/files/saml/xmldsig_schema.xsd - test/files/shift_jis.html - test/files/shift_jis.xml - test/files/snuggles.xml - test/files/staff.dtd - test/files/staff.xml - test/files/staff.xslt - test/files/test_document_url/bar.xml - test/files/test_document_url/document.dtd - test/files/test_document_url/document.xml - test/files/tlm.html - test/files/to_be_xincluded.xml - test/files/valid_bar.xml - test/files/xinclude.xml - test/helper.rb - test/html/sax/test_parser.rb - test/html/sax/test_parser_context.rb - test/html/test_builder.rb - test/html/test_document.rb - test/html/test_document_encoding.rb - test/html/test_document_fragment.rb - test/html/test_element_description.rb - test/html/test_named_characters.rb - test/html/test_node.rb - test/html/test_node_encoding.rb - test/namespaces/test_additional_namespaces_in_builder_doc.rb - test/namespaces/test_namespaces_in_builder_doc.rb - test/namespaces/test_namespaces_in_created_doc.rb - test/namespaces/test_namespaces_in_parsed_doc.rb - test/test_convert_xpath.rb - test/test_css_cache.rb - test/test_encoding_handler.rb - test/test_memory_leak.rb - test/test_nokogiri.rb - test/test_reader.rb - test/test_soap4r_sax.rb - test/test_xslt_transforms.rb - test/xml/node/test_save_options.rb - test/xml/node/test_subclass.rb - test/xml/sax/test_parser.rb - test/xml/sax/test_parser_context.rb - test/xml/sax/test_push_parser.rb - test/xml/test_attr.rb - test/xml/test_attribute_decl.rb - test/xml/test_builder.rb - test/xml/test_c14n.rb - test/xml/test_cdata.rb - test/xml/test_comment.rb - test/xml/test_document.rb - test/xml/test_document_encoding.rb - test/xml/test_document_fragment.rb - test/xml/test_dtd.rb - test/xml/test_dtd_encoding.rb - test/xml/test_element_content.rb - test/xml/test_element_decl.rb - test/xml/test_entity_decl.rb - test/xml/test_entity_reference.rb - test/xml/test_namespace.rb - test/xml/test_node.rb - test/xml/test_node_attributes.rb - test/xml/test_node_encoding.rb - test/xml/test_node_inheritance.rb - test/xml/test_node_reparenting.rb - test/xml/test_node_set.rb - test/xml/test_parse_options.rb - test/xml/test_processing_instruction.rb - test/xml/test_reader_encoding.rb - test/xml/test_relax_ng.rb - test/xml/test_schema.rb - test/xml/test_syntax_error.rb - test/xml/test_text.rb - test/xml/test_unparented_node.rb - test/xml/test_xinclude.rb - test/xml/test_xpath.rb - test/xslt/test_custom_functions.rb - test/xslt/test_exception_handling.rb - test_all - ports/archives/libxml2-2.8.0.tar.gz - ports/archives/libxslt-1.1.26.tar.gz homepage: http://nokogiri.org licenses: - MIT metadata: {} post_install_message: rdoc_options: - --main - README.rdoc require_paths: - lib required_ruby_version: !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: 1.9.2 required_rubygems_version: !ruby/object:Gem::Requirement requirements: - - ">=" - !ruby/object:Gem::Version version: "0" requirements: [] rubyforge_project: nokogiri rubygems_version: 2.0.13 signing_key: specification_version: 4 summary: "Nokogiri (\xE9\x8B\xB8) is an HTML, XML, SAX, and Reader parser" test_files: - test/decorators/test_slop.rb - test/test_encoding_handler.rb - test/css/test_parser.rb - test/css/test_nthiness.rb - test/css/test_tokenizer.rb - test/css/test_xpath_visitor.rb - test/xslt/test_exception_handling.rb - test/xslt/test_custom_functions.rb - test/test_reader.rb - test/xml/test_comment.rb - test/xml/test_unparented_node.rb - test/xml/test_processing_instruction.rb - test/xml/test_node_attributes.rb - test/xml/test_xpath.rb - test/xml/test_node_encoding.rb - test/xml/test_element_decl.rb - test/xml/test_entity_decl.rb - test/xml/test_namespace.rb - test/xml/test_cdata.rb - test/xml/test_node_inheritance.rb - test/xml/test_entity_reference.rb - test/xml/test_text.rb - test/xml/test_reader_encoding.rb - test/xml/test_dtd.rb - test/xml/test_xinclude.rb - test/xml/test_parse_options.rb - test/xml/test_schema.rb - test/xml/test_element_content.rb - test/xml/test_document.rb - test/xml/test_relax_ng.rb - test/xml/test_c14n.rb - test/xml/test_dtd_encoding.rb - test/xml/test_syntax_error.rb - test/xml/test_attribute_decl.rb - test/xml/test_node_set.rb - test/xml/test_builder.rb - test/xml/sax/test_parser.rb - test/xml/sax/test_push_parser.rb - test/xml/sax/test_parser_context.rb - test/xml/test_document_encoding.rb - test/xml/test_attr.rb - test/xml/test_document_fragment.rb - test/xml/test_node.rb - test/xml/test_node_reparenting.rb - test/xml/node/test_save_options.rb - test/xml/node/test_subclass.rb - test/test_css_cache.rb - test/test_soap4r_sax.rb - test/html/test_node_encoding.rb - test/html/test_document.rb - test/html/test_named_characters.rb - test/html/test_builder.rb - test/html/sax/test_parser.rb - test/html/sax/test_parser_context.rb - test/html/test_document_encoding.rb - test/html/test_element_description.rb - test/html/test_document_fragment.rb - test/html/test_node.rb - test/test_memory_leak.rb - test/test_convert_xpath.rb - test/namespaces/test_namespaces_in_builder_doc.rb - test/namespaces/test_namespaces_in_created_doc.rb - test/namespaces/test_additional_namespaces_in_builder_doc.rb - test/namespaces/test_namespaces_in_parsed_doc.rb - test/test_xslt_transforms.rb - test/test_nokogiri.rb