html5lib-0.999/0000755000175000001440000000000012256051465014427 5ustar gsneddersusers00000000000000html5lib-0.999/requirements-optional-cpython.txt0000644000175000001440000000021712163146070023212 0ustar gsneddersusers00000000000000-r requirements-optional.txt # lxml is supported with its own treebuilder ("lxml") and otherwise # uses the standard ElementTree support lxml html5lib-0.999/CHANGES.rst0000644000175000001440000001003212256051425016221 0ustar gsneddersusers00000000000000Change Log ---------- 0.999 ~~~~~ Released on December 23, 2013 * Fix #127: add work-around for CPython issue #20007: .read(0) on http.client.HTTPResponse drops the rest of the content. * Fix #115: lxml treewalker can now deal with fragments containing, at their root level, text nodes with non-ASCII characters on Python 2. 0.99 ~~~~ Released on September 10, 2013 * No library changes from 1.0b3; released as 0.99 as pip has changed behaviour from 1.4 to avoid installing pre-release versions per PEP 440. 1.0b3 ~~~~~ Released on July 24, 2013 * Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any implementation using it should be moved to ``NonRecursiveTreeWalker``, as everything bundled with html5lib has for years. * Fix #67 so that ``BufferedStream`` to correctly returns a bytes object, thereby fixing any case where html5lib is passed a non-seekable RawIOBase-like object. 1.0b2 ~~~~~ Released on June 27, 2013 * Removed reordering of attributes within the serializer. There is now an ``alphabetical_attributes`` option which preserves the previous behaviour through a new filter. This allows attribute order to be preserved through html5lib if the tree builder preserves order. * Removed ``dom2sax`` from DOM treebuilders. It has been replaced by ``treeadapters.sax.to_sax`` which is generic and supports any treewalker; it also resolves all known bugs with ``dom2sax``. * Fix treewalker assertions on hitting bytes strings on Python 2. Previous to 1.0b1, treewalkers coped with mixed bytes/unicode data on Python 2; this reintroduces this prior behaviour on Python 2. Behaviour is unchanged on Python 3. 1.0b1 ~~~~~ Released on May 17, 2013 * Implementation updated to implement the `HTML specification `_ as of 5th May 2013 (`SVN `_ revision r7867). * Python 3.2+ supported in a single codebase using the ``six`` library. * Removed support for Python 2.5 and older. * Removed the deprecated Beautiful Soup 3 treebuilder. ``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that since it doesn't support namespaces, foreign content like SVG and MathML is parsed incorrectly. * Removed ``simpletree`` from the package. The default tree builder is now ``etree`` (using the ``xml.etree.cElementTree`` implementation if available, and ``xml.etree.ElementTree`` otherwise). * Removed the ``XHTMLSerializer`` as it never actually guaranteed its output was well-formed XML, and hence provided little of use. * Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will return the default DOM treebuilder, which uses ``xml.dom.minidom``. * Optional heuristic character encoding detection now based on ``charade`` for Python 2.6 - 3.3 compatibility. * Optional ``Genshi`` treewalker support fixed. * Many bugfixes, including: * #33: null in attribute value breaks XML AttValue; * #4: nested, indirect descendant, #errors Line: 1 Col: 7 Unexpected start tag (table). Expected DOCTYPE. Line: 1 Col: 20 Unexpected end tag (strong) in table context caused voodoo mode. Line: 1 Col: 20 End tag (strong) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 24 Unexpected end tag (b) in table context caused voodoo mode. Line: 1 Col: 24 End tag (b) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 29 Unexpected end tag (em) in table context caused voodoo mode. Line: 1 Col: 29 End tag (em) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 33 Unexpected end tag (i) in table context caused voodoo mode. Line: 1 Col: 33 End tag (i) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 37 Unexpected end tag (u) in table context caused voodoo mode. Line: 1 Col: 37 End tag (u) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 46 Unexpected end tag (strike) in table context caused voodoo mode. Line: 1 Col: 46 End tag (strike) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 50 Unexpected end tag (s) in table context caused voodoo mode. Line: 1 Col: 50 End tag (s) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 58 Unexpected end tag (blink) in table context caused voodoo mode. Line: 1 Col: 58 Unexpected end tag (blink). Ignored. Line: 1 Col: 63 Unexpected end tag (tt) in table context caused voodoo mode. Line: 1 Col: 63 End tag (tt) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 69 Unexpected end tag (pre) in table context caused voodoo mode. Line: 1 Col: 69 End tag (pre) seen too early. Expected other end tag. Line: 1 Col: 75 Unexpected end tag (big) in table context caused voodoo mode. Line: 1 Col: 75 End tag (big) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 83 Unexpected end tag (small) in table context caused voodoo mode. Line: 1 Col: 83 End tag (small) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 90 Unexpected end tag (font) in table context caused voodoo mode. Line: 1 Col: 90 End tag (font) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 99 Unexpected end tag (select) in table context caused voodoo mode. Line: 1 Col: 99 Unexpected end tag (select). Ignored. Line: 1 Col: 104 Unexpected end tag (h1) in table context caused voodoo mode. Line: 1 Col: 104 End tag (h1) seen too early. Expected other end tag. Line: 1 Col: 109 Unexpected end tag (h2) in table context caused voodoo mode. Line: 1 Col: 109 End tag (h2) seen too early. Expected other end tag. Line: 1 Col: 114 Unexpected end tag (h3) in table context caused voodoo mode. Line: 1 Col: 114 End tag (h3) seen too early. Expected other end tag. Line: 1 Col: 119 Unexpected end tag (h4) in table context caused voodoo mode. Line: 1 Col: 119 End tag (h4) seen too early. Expected other end tag. Line: 1 Col: 124 Unexpected end tag (h5) in table context caused voodoo mode. Line: 1 Col: 124 End tag (h5) seen too early. Expected other end tag. Line: 1 Col: 129 Unexpected end tag (h6) in table context caused voodoo mode. Line: 1 Col: 129 End tag (h6) seen too early. Expected other end tag. Line: 1 Col: 136 Unexpected end tag (body) in the table row phase. Ignored. Line: 1 Col: 141 Unexpected end tag (br) in table context caused voodoo mode. Line: 1 Col: 141 Unexpected end tag (br). Treated as br element. Line: 1 Col: 145 Unexpected end tag (a) in table context caused voodoo mode. Line: 1 Col: 145 End tag (a) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 151 Unexpected end tag (img) in table context caused voodoo mode. Line: 1 Col: 151 This element (img) has no end tag. Line: 1 Col: 159 Unexpected end tag (title) in table context caused voodoo mode. Line: 1 Col: 159 Unexpected end tag (title). Ignored. Line: 1 Col: 166 Unexpected end tag (span) in table context caused voodoo mode. Line: 1 Col: 166 Unexpected end tag (span). Ignored. Line: 1 Col: 174 Unexpected end tag (style) in table context caused voodoo mode. Line: 1 Col: 174 Unexpected end tag (style). Ignored. Line: 1 Col: 183 Unexpected end tag (script) in table context caused voodoo mode. Line: 1 Col: 183 Unexpected end tag (script). Ignored. Line: 1 Col: 196 Unexpected end tag (th). Ignored. Line: 1 Col: 201 Unexpected end tag (td). Ignored. Line: 1 Col: 206 Unexpected end tag (tr). Ignored. Line: 1 Col: 214 This element (frame) has no end tag. Line: 1 Col: 221 This element (area) has no end tag. Line: 1 Col: 228 Unexpected end tag (link). Ignored. Line: 1 Col: 236 This element (param) has no end tag. Line: 1 Col: 241 This element (hr) has no end tag. Line: 1 Col: 249 This element (input) has no end tag. Line: 1 Col: 255 Unexpected end tag (col). Ignored. Line: 1 Col: 262 Unexpected end tag (base). Ignored. Line: 1 Col: 269 Unexpected end tag (meta). Ignored. Line: 1 Col: 280 This element (basefont) has no end tag. Line: 1 Col: 290 This element (bgsound) has no end tag. Line: 1 Col: 298 This element (embed) has no end tag. Line: 1 Col: 307 This element (spacer) has no end tag. Line: 1 Col: 311 Unexpected end tag (p). Ignored. Line: 1 Col: 316 End tag (dd) seen too early. Expected other end tag. Line: 1 Col: 321 End tag (dt) seen too early. Expected other end tag. Line: 1 Col: 331 Unexpected end tag (caption). Ignored. Line: 1 Col: 342 Unexpected end tag (colgroup). Ignored. Line: 1 Col: 350 Unexpected end tag (tbody). Ignored. Line: 1 Col: 358 Unexpected end tag (tfoot). Ignored. Line: 1 Col: 366 Unexpected end tag (thead). Ignored. Line: 1 Col: 376 End tag (address) seen too early. Expected other end tag. Line: 1 Col: 389 End tag (blockquote) seen too early. Expected other end tag. Line: 1 Col: 398 End tag (center) seen too early. Expected other end tag. Line: 1 Col: 404 Unexpected end tag (dir). Ignored. Line: 1 Col: 410 End tag (div) seen too early. Expected other end tag. Line: 1 Col: 415 End tag (dl) seen too early. Expected other end tag. Line: 1 Col: 426 End tag (fieldset) seen too early. Expected other end tag. Line: 1 Col: 436 End tag (listing) seen too early. Expected other end tag. Line: 1 Col: 443 End tag (menu) seen too early. Expected other end tag. Line: 1 Col: 448 End tag (ol) seen too early. Expected other end tag. Line: 1 Col: 453 End tag (ul) seen too early. Expected other end tag. Line: 1 Col: 458 End tag (li) seen too early. Expected other end tag. Line: 1 Col: 465 End tag (nobr) violates step 1, paragraph 1 of the adoption agency algorithm. Line: 1 Col: 471 This element (wbr) has no end tag. Line: 1 Col: 487 End tag (button) seen too early. Expected other end tag. Line: 1 Col: 497 End tag (marquee) seen too early. Expected other end tag. Line: 1 Col: 506 End tag (object) seen too early. Expected other end tag. Line: 1 Col: 524 Unexpected end tag (html). Ignored. Line: 1 Col: 524 Unexpected end tag (frameset). Ignored. Line: 1 Col: 531 Unexpected end tag (head). Ignored. Line: 1 Col: 540 Unexpected end tag (iframe). Ignored. Line: 1 Col: 548 This element (image) has no end tag. Line: 1 Col: 558 This element (isindex) has no end tag. Line: 1 Col: 568 Unexpected end tag (noembed). Ignored. Line: 1 Col: 579 Unexpected end tag (noframes). Ignored. Line: 1 Col: 590 Unexpected end tag (noscript). Ignored. Line: 1 Col: 601 Unexpected end tag (optgroup). Ignored. Line: 1 Col: 610 Unexpected end tag (option). Ignored. Line: 1 Col: 622 Unexpected end tag (plaintext). Ignored. Line: 1 Col: 633 Unexpected end tag (textarea). Ignored. #document | | | |
| | | |

#data #errors Line: 1 Col: 10 Unexpected start tag (frameset). Expected DOCTYPE. Line: 1 Col: 10 Expected closing tag. Unexpected end of file. #document | | | html5lib-0.999/html5lib/tests/testdata/tree-construction/scriptdata01.dat0000644000175000001440000001030112131123637027566 0ustar gsneddersusers00000000000000#data FOOBAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | BAR #errors #document | | | | "FOO" | QUX #errors #document | | | | "FOO" | x #errors #document | | | | #errors #document | | | | | | #data

#errors #document | | | | |
| | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | #data #errors #document | | | | | | | #data
#errors #document | | | | | | | |
| | | | | #data #errors #document | | | | | | | | | | | | | | ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������html5lib-0.999/html5lib/tests/testdata/tree-construction/tables01.dat�������������������������������0000644�0001750�0000144�00000005322�12131123637�026711� 0����������������������������������������������������������������������������������������������������ustar �gsnedders�����������������������users���������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#data
#errors #document | | | | | | |
#data
#errors #document | | | | | | |
#data #errors #document | | | |
| | | foo="bar" #data
foo #errors #document | | | | "foo" |
| #data

foo #errors #document | | | | |

| "foo" #data

#errors #document | | | | | | |
#data
#errors #document | | | | #data
#errors #document | | | | |
#data
#errors #document | | | | #data
B
#errors #document | | | | | | |
| "B" #data
foo #errors #document | | | | | | |
| "foo" #data
A
B #errors #document | | | | | | |
| "A" | "B" #data
#errors #document | | | | | | |
#data
foo #errors #document | | | | | | |
| "foo" #data #errors #document | | | |
| | | #data
|
#errors #document | | | | | | |
| #data
#errors #document | | | | | | |
| | | ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������html5lib-0.999/html5lib/tests/testdata/tree-construction/domjs-unsafe.dat���������������������������0000644�0001750�0000144�00000014757�12131123637�027705� 0����������������������������������������������������������������������������������������������������ustar �gsnedders�����������������������users���������������������������0000000�0000000������������������������������������������������������������������������������������������������������������������������������������������������������������������������#data foo bar #errors #document | | | | | "foo bar" #data foo bar #errors #document | | | | | "foo bar" #data foo bar #errors #document | | | | | "foo bar" #data #errors #document | | | #errors #document | | | #errors #document | | | #errors #document | | | #errors #document | | | #errors #document | | | #errors #document | | | #errors #document | | | #errors #document | | | #errors #document | | | abc #errors #document | | | | | "abc" | | | | abc #errors #document | | | | | "abc" |
| | | abc #errors #document | | | | |
|
| abc #errors #document | | | | | | | | #errors #document | | | |
| abc #errors #document | | | | | | abc #errors #document | | | | |