HTML-FormatExternal-26/0002755000175000017500000000000012570233265012563 5ustar ggggHTML-FormatExternal-26/COPYING0000644000175000017500000010437410641206144013616 0ustar gggg GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . HTML-FormatExternal-26/Changes0000644000175000017500000000436612570233240014056 0ustar ggggCopyright 2008, 2009, 2010, 2011, 2013, 2015 Kevin Ryde This file is part of HTML-FormatExternal. HTML-FormatExternal is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. HTML-FormatExternal is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with HTML-FormatExternal. If not, see http://www.gnu.org/licenses/ Version 26, August 2015 - tests again fix no "http:" on cygwin Version 25, August 2015 - tests don't attempt filename like "http:" on Mac, MS-DOS, etc since on cygwin it depends on the external program also being a cygwin build Version 24, June 2015 - Netrik allow for no TERM environment variable (RT #105221 and Debian #788639) Version 23, April 2015 - new wide char handling - Lynx new unique_links option - Netrik avoid program_full_version() opening start page of ~/.netrikrc Version 22, June 2013 - really don't distribute file "http:" this time Version 21, June 2013 - don't distribute file "http:" as it won't unpack on CP/M and similar, per report by Alexandr Ciornii RT#86008 Version 20, June 2013 - escape strange filenames like "-" or "%" - new HTML::FormatText::Vilistextum - use IPC::Run Version 19, February 2011 - fix Makefile.PL for perl 5.6 and earlier Version 18, September 2010 - test manifest only as an author test Version 17, March 2010 - newer File::Temp to avoid mutual misfeatures between it and File::Copy Version 16, March 2010 - new "base" option Version 15, February 2010 - new home page Version 14, February 2009 - new Html2text Version 13, December 2008 - new justify option for lynx - ignore unrecognised options, for safety - fix tests qr//m no good pre perl 5.10 Version 12, December 2008 - mucho test count fixes - cope with older links and lynx without nomargin Version 11, December 2008 - the first version HTML-FormatExternal-26/xtools/0002755000175000017500000000000012570233261014107 5ustar ggggHTML-FormatExternal-26/xtools/my-check-spelling.sh0000755000175000017500000000320612350135425017757 0ustar gggg#!/bin/sh # my-check-spelling.sh -- grep for spelling errors # Copyright 2009, 2010, 2011, 2012, 2013, 2014 Kevin Ryde # my-check-spelling.sh is shared by several distributions. # # my-check-spelling.sh is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 3, or (at your option) any later # version. # # my-check-spelling.sh is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . set -e # set -x # | tee /dev/stdout # -name samp -prune \ # -o -name formats -prune \ # -o -name "*~" -prune \ # -o -name "*.tar.gz" -prune \ # -o -name "*.deb" -prune \ # -o # -o -name dist-deb -prune \ # | egrep -v '(Makefile|dist-deb)' \ if find . -name my-check-spelling.sh -prune \ -o -type f -print0 \ | xargs -0 egrep --color=always -nHi 'optino|recurrance|nineth|\bon on\b|\bto to\b|tranpose|adjustement|glpyh|rectanglar|availabe|grabing|cusor|refering|writeable|nineth|\bommitt?ed|omited|[$][rd]elf|requrie|noticable|continous|existant|explict|agument|destionation|\bthe the\b|\bfor for\b|\bare have\b|\bare are\b|\bwith with\b|\bin in\b|\b[tw]hen then\b|\bnote sure\b|\bnote yet\b|correspondance|sprial|wholely|satisif|\bteh\b|\btje\b' then # nothing found exit 1 else exit 0 fi HTML-FormatExternal-26/xtools/my-pc.sh0000755000175000017500000000323312206324154015470 0ustar gggg#!/bin/sh # my-pc.sh -- run cpants_lint kwalitee checker # Copyright 2009, 2010, 2011, 2012, 2013 Kevin Ryde # my-pc.sh is shared by several distributions. # # my-pc.sh is free software; you can redistribute it # and/or modify it under the terms of the GNU General Public License as # published by the Free Software Foundation; either version 3, or (at your # option) any later version. # # my-pc.sh is distributed in the hope that it will be # useful, but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General # Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . set -x # PERLRUNINST=`sed -n 's/^PERLRUNINST = \(.*\)/\1/p' Makefile` # if test -z "$PERLRUNINST"; then # echo "PERLRUNINST not found" # exit 1 # fi EXE_FILES=`sed -n 's/^EXE_FILES = \(.*\)/\1/p' Makefile` TO_INST_PM=`find lib -name \*.pm` LINT_FILES="Makefile.PL $EXE_FILES $TO_INST_PM" if test -e "t/*.t"; then LINT_FILES="$LINT_FILES t/*.t" fi if test -e "xt/*.t"; then LINT_FILES="$LINT_FILES xt/*.t" fi for i in t xt examples devel; do if test -e "$i/*.pl"; then LINT_FILES="$LINT_FILES $i/*.pl" fi if test -e "$i/*.pm"; then LINT_FILES="$LINT_FILES $i/*.pm" fi done perl -e 'use Test::Vars; all_vars_ok()' # MyMakeMakerExtras_Pod_Coverage perl -e 'use Pod::Coverage package => $class' podlinkcheck -I lib `ls $LINT_FILES | grep -v '\.bash$$|\.desktop$$\.png$$|\.xpm$$'` podchecker -nowarnings `ls $LINT_FILES | grep -v '\.bash$$|\.desktop$$\.png$$|\.xpm$$'` perlcritic $LINT_FILES HTML-FormatExternal-26/xtools/my-deb.sh0000755000175000017500000000710312536744717015641 0ustar gggg#!/bin/sh # my-deb.sh -- make .deb # Copyright 2009, 2010, 2011, 2012, 2013, 2014, 2015 Kevin Ryde # my-deb.sh is shared by several distributions. # # my-deb.sh is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 3, or (at your option) any later # version. # # my-deb.sh is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . # warnings::unused broken by perl 5.14, so use 5.10 for checks set -e set -x DISTNAME=`sed -n 's/^DISTNAME = \(.*\)/\1/p' Makefile` if test -z "$DISTNAME"; then echo "DISTNAME not found" exit 1 fi DISTVNAME=`sed -n 's/^DISTVNAME = \(.*\)/\1/p' Makefile` if test -z "$DISTVNAME"; then echo "DISTVNAME not found" exit 1 fi VERSION=`sed -n 's/^VERSION = \(.*\)/\1/p' Makefile` if test -z "$VERSION"; then echo "VERSION not found" exit 1 fi XS_FILES=`sed -n 's/^XS_FILES = \(.*\)/\1/p' Makefile` EXE_FILES=`sed -n 's/^EXE_FILES = \(.*\)/\1/p' Makefile` if test -z "$XS_FILES" then DPKG_ARCH=all else DPKG_ARCH=`dpkg --print-architecture` fi # programs named after the dist, libraries named with "lib" # gtk2-ex-splash and wx-perl-podbrowser programs are lib too though DEBNAME=`echo $DISTNAME | tr A-Z a-z` DEBNAME=`echo $DEBNAME | sed 's/app-//'` case "$EXE_FILES" in gtk2-ex-splash|wx-perl-podbrowser|'') DEBNAME="lib${DEBNAME}-perl" ;; esac DEBVNAME="${DEBNAME}_$VERSION-0.1" DEBFILE="${DEBVNAME}_$DPKG_ARCH.deb" # ExtUtils::MakeMaker 6.42 of perl 5.10.0 makes "$(DISTVNAME).tar.gz" depend # on "$(DISTVNAME)" distdir directory, which is always non-existent after a # successful dist build, so the .tar.gz is always rebuilt. # # So although the .deb depends on the .tar.gz don't express that here or it # rebuilds the .tar.gz every time. # # The right rule for the .tar.gz would be to depend on the files which go # into it of course ... # # DISPLAY is unset for making a deb since under fakeroot gtk stuff may try # to read config files like ~/.pangorc from root's home dir /root/.pangorc, # and that dir will be unreadable by ordinary users (normally), provoking # warnings and possible failures from nowarnings(). # test -f $DISTVNAME.tar.gz || make $DISTVNAME.tar.gz debver="`dpkg-parsechangelog -c1 | sed -n -r -e 's/^Version: (.*)-[0-9.]+$/\1/p'`" echo "debver $debver", want $VERSION test "$debver" = "$VERSION" rm -rf $DISTVNAME tar xfz $DISTVNAME.tar.gz unset DISPLAY; export DISPLAY cd $DISTVNAME dpkg-checkbuilddeps debian/control fakeroot debian/rules binary cd .. rm -rf $DISTVNAME #------------------------------------------------------------------------------ # lintian .deb and source lintian -I -i \ --suppress-tags new-package-should-close-itp-bug,desktop-entry-contains-encoding-key \ $DEBFILE TEMP="/tmp/temp-lintian-$DISTVNAME" rm -rf $TEMP mkdir $TEMP cp $DISTVNAME.tar.gz $TEMP/${DEBNAME}_$VERSION.orig.tar.gz cd $TEMP tar xfz ${DEBNAME}_$VERSION.orig.tar.gz if test "$DISTVNAME" != "$DEBNAME-$VERSION"; then mv -T $DISTVNAME $DEBNAME-$VERSION fi dpkg-source -b $DEBNAME-$VERSION \ ${DEBNAME}_$VERSION.orig.tar.gz; \ lintian -I -i \ --suppress-tags maintainer-upload-has-incorrect-version-number,changelog-should-mention-nmu,empty-debian-diff,debian-rules-uses-deprecated-makefile *.dsc cd / rm -rf $TEMP exit 0 HTML-FormatExternal-26/xtools/my-check-copyright-years.sh0000755000175000017500000000445612530216640021302 0ustar gggg#!/bin/sh # my-check-copyright-years.sh -- check copyright years in dist # Copyright 2009, 2010, 2011, 2012, 2013, 2014, 2015 Kevin Ryde # my-check-copyright-years.sh is shared by several distributions. # # my-check-copyright-years.sh is free software; you can redistribute it # and/or modify it under the terms of the GNU General Public License as # published by the Free Software Foundation; either version 3, or (at your # option) any later version. # # my-check-copyright-years.sh is distributed in the hope that it will be # useful, but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General # Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . set -e # die on error set -x # echo # find files in the dist with mod times this year, but without this year in # the copyright line if test -z "$DISTVNAME"; then DISTVNAME=`sed -n 's/^DISTVNAME = \(.*\)/\1/p' Makefile` fi case $DISTVNAME in *\$*) DISTVNAME=`make echo-DISTVNAME` ;; esac if test -z "$DISTVNAME"; then echo "DISTVNAME not set and not in Makefile" exit 1 fi TARGZ="$DISTVNAME.tar.gz" if test -e "$TARGZ"; then :; else pwd echo "TARGZ $TARGZ not found" exit 1 fi MY_HIDE= year=`date +%Y` result=0 # files with dates $year tar tvfz $TARGZ \ | egrep "$year-|debian/copyright" \ | sed "s:^.*$DISTVNAME/::" \ | { while read i do # echo "consider $i" GREP=grep case $i in \ '' | */ \ | ppport.h \ | debian/changelog | debian/doc-base \ | debian/compat | debian/emacsen-compat | debian/source/format \ | debian/patches/*.diff \ | COPYING | MANIFEST* | SIGNATURE | META.yml | META.json \ | version.texi | */version.texi \ | *utf16* | examples/rs''s2lea''fnode.conf \ | */MathI''mage/ln2.gz | */MathI''mage/pi.gz \ | *.mo | *.locatedb* | t/samp.* \ | t/empty.dat | t/*.xpm | t/*.xbm | t/*.jpg | t/*.gif \ | t/*.g${MY_HIDE}d \ | */_whizzy*) continue ;; *.gz) GREP=zgrep esac; \ if test -e "$srcdir/$i" then f="$srcdir/$i" else f="$i" fi if $GREP -q -e "Copyright.*$year" $f then :; else echo "$i:1: this file" grep Copyright $f result=1 fi done } exit $result HTML-FormatExternal-26/xt/0002755000175000017500000000000012570233261013212 5ustar ggggHTML-FormatExternal-26/xt/0-Test-ConsistentVersion.t0000644000175000017500000000225311655356324020161 0ustar gggg#!/usr/bin/perl -w # 0-Test-ConsistentVersion.t -- run Test::ConsistentVersion if available # Copyright 2011 Kevin Ryde # 0-Test-ConsistentVersion.t is shared by several distributions. # # 0-Test-ConsistentVersion.t is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # 0-Test-ConsistentVersion.t is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General # Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . use 5.004; use strict; use Test::More; eval { require Test::ConsistentVersion } or plan skip_all => "due to Test::ConsistentVersion not available -- $@"; Test::ConsistentVersion::check_consistent_versions (no_readme => 1, # no version number in my READMEs no_pod => 1, # no version number in my docs, at the moment ); # ! -e 'README'); exit 0; HTML-FormatExternal-26/xt/0-Test-Synopsis.t0000755000175000017500000000176411655356314016321 0ustar gggg#!/usr/bin/perl -w # 0-Test-Synopsis.t -- run Test::Synopsis if available # Copyright 2009, 2010, 2011 Kevin Ryde # 0-Test-Synopsis.t is shared by several distributions. # # 0-Test-Synopsis.t is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 3, or (at your option) any later # version. # # 0-Test-Synopsis.t is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . use 5.004; use strict; use Test::More; eval 'use Test::Synopsis; 1' or plan skip_all => "due to Test::Synopsis not available -- $@"; ## no critic (ProhibitCallsToUndeclaredSubs) all_synopsis_ok(); exit 0; HTML-FormatExternal-26/xt/More.t0000644000175000017500000001214212153267605014305 0ustar gggg#!/usr/bin/perl # Copyright 2008, 2009, 2010, 2013 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use 5.008; use Encode; use HTML::FormatExternal; use Test::More tests => 29; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings() } use HTML::FormatText::Elinks; use HTML::FormatText::Links; use HTML::FormatText::Lynx; use HTML::FormatText::Netrik; use HTML::FormatText::Vilistextum; use HTML::FormatText::W3m; use HTML::FormatText::Zen; # uncomment this to run the ### lines # use Smart::Comments; #------------------------------------------------------------------------------ # vilistextum version 2.6.9 complains # "?? getopt returned character code 0%o %c??\n" # if -u, -c or -y used when no --enable-multibyte # my $vilistextum_have_multibyte = do { require File::Spec; require IPC::Run; my $str; eval { IPC::Run::run(['vilistextum','-u'], '<',File::Spec->devnull, '>',\$str, '2>&1') }; defined $str && $str !~ /getopt/ ? 1 : 0 }; diag "vilistextum_have_multibyte is $vilistextum_have_multibyte"; #------------------------------------------------------------------------------ # links: U+263A input becomes ":-)" in latin1 output # SKIP: { my $class = 'HTML::FormatText::Links'; $class->program_version or skip "$class not available", 1; diag $class; my $input_charset = 'utf-8'; my $output_charset = 'latin-1'; my $html = "\x{263A}"; $html = Encode::encode ($input_charset, $html); my $str = $class->format_string ($html, input_charset => $input_charset, output_charset => $output_charset); like ($str, qr/\Q:-)/, "$class U+263A smiley $input_charset -> $output_charset"); } # lynx undocumented 'justify' option # SKIP: { my $class = 'HTML::FormatText::Lynx'; $class->program_version or skip "$class not available", 1; diag $class; my $html = "x y z aaaa"; $html = Encode::encode ('utf-8', $html); my $str = $class->format_string ($html, leftmargin => 0, rightmargin => 7, justify => 1); like ($str, qr/^x y z$/m, "$class justify option"); } foreach my $class ('HTML::FormatText::Elinks', 'HTML::FormatText::Links', 'HTML::FormatText::Lynx', # 'HTML::FormatText::Netrik', # no charsets 'HTML::FormatText::Vilistextum', 'HTML::FormatText::W3m', # 'HTML::FormatText::Zen', # no charsets ) { SKIP: { diag $class; $class->program_full_version or skip "$class program not available", 3; my $input_charset = 'utf-8'; my $output_charset = 'iso-8859-1'; my $html = "\x{B0}\n"; $html = Encode::encode ($input_charset, $html); is (length($html), 12+2+15); my $str = $class->format_string ($html, input_charset => $input_charset, output_charset => $output_charset); my $degree_bytes = "\x{B0}"; $degree_bytes = Encode::encode ($output_charset, $degree_bytes); is (length($degree_bytes), 1); like ($str, qr/\Q$degree_bytes/, "$class degree sign $input_charset -> $output_charset"); ### $str } } foreach my $class ('HTML::FormatText::Elinks', # 'HTML::FormatText::Links', # no utf-8 output 'HTML::FormatText::Lynx', # 'HTML::FormatText::Netrik', # no charsets 'HTML::FormatText::Vilistextum', 'HTML::FormatText::W3m', # 'HTML::FormatText::Zen', # no charsets ) { SKIP: { diag $class; $class->program_full_version or skip "$class program not available", 3; if ($class eq 'HTML::FormatText::Vilistextum' && ! $vilistextum_have_multibyte) { skip "vilistextum not built with multibyte", 3; } my $input_charset = 'iso-8859-1'; my $output_charset = 'utf-8'; my $html = "\x{B0}\n"; $html = Encode::encode ($input_charset, $html); is (length($html), 12+1+15); my $str = $class->format_string ($html, input_charset => $input_charset, output_charset => $output_charset); my $degree_bytes = "\x{B0}"; $degree_bytes = Encode::encode ($output_charset, $degree_bytes); is (length($degree_bytes), 2); like ($str, qr/\Q$degree_bytes/, "$class degree sign $input_charset -> $output_charset"); } } exit 0; HTML-FormatExternal-26/xt/0-no-debug-left-on.t0000755000175000017500000000651512044143060016577 0ustar gggg#!/usr/bin/perl -w # 0-no-debug-left-on.t -- check no Smart::Comments left on # Copyright 2011, 2012 Kevin Ryde # 0-no-debug-left-on.t is shared by several distributions. # # 0-no-debug-left-on.t is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # 0-no-debug-left-on.t is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General # Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . # cf Test::NoSmartComments which uses Module::ScanDeps. require 5; use strict; Test::NoDebugLeftOn->Test_More(verbose => 0); exit 0; package Test::NoDebugLeftOn; use strict; use ExtUtils::Manifest; sub Test_More { my ($class, %options) = @_; require Test::More; Test::More::plan (tests => 1); Test::More::ok ($class->check (diag => \&Test::More::diag, %options)); 1; } sub check { my ($class, %options) = @_; my $diag = $options{'diag'}; if (! -e 'Makefile.PL') { &$diag ('skip, no Makefile.PL so not ExtUtils::MakeMaker'); return 1; } my $href = ExtUtils::Manifest::maniread(); my @files = keys %$href; my $good = 1; my @perl_files = grep {m{ ^lib/ |^(lib|examples|x?t)/.*\.(p[lm]|t)$ |^Makefile.PL$ |^[^/]+$ }x } @files; my $filename; foreach $filename (@perl_files) { if ($options{'verbose'}) { &$diag ("perl file ",$filename); } if (! open FH, "< $filename") { &$diag ("Oops, cannot open $filename: $!"); $good = 0; next; } while () { if (/^__END__/) { last; } # only a DEBUG=> non-zero number is bad, so an expression can copy a # debug from another package if (/(DEBUG\s*=>\s*[1-9][0-9]*)/ || /^[ \t]*((use|no) (Smart|Devel)::Comments)/ || /^[ \t]*(use lib\b.*devel.*)/ ) { print STDERR "\n$filename:$.: leftover: $_\n"; $good = 0; } } if (! close FH) { &$diag ("Oops, error closing $filename: $!"); $good = 0; next; } } my @C_files = grep {m{ # toplevel or lib .c and .xs files ^[^/]*\.([ch]|xs)$ |^(lib|examples|x?t)/.*\.([ch]|xs)$ }x } @files; foreach $filename (@C_files) { if ($options{'verbose'}) { &$diag ("C/XS file ",$filename); } if (! open FH, "< $filename") { &$diag ("Oops, cannot open $filename: $!"); $good = 0; next; } while () { if (/^#\s*define\s+DEBUG\s+[1-9]/ ) { print STDERR "\n$filename:$.: leftover: $_\n"; $good = 0; } } if (! close FH) { &$diag ("Oops, error closing $filename: $!"); $good = 0; next; } } &$diag ("checked ",scalar(@perl_files)," perl files, ", scalar(@C_files)," C/XS files\n"); return $good; } HTML-FormatExternal-26/xt/0-file-is-part-of.t0000644000175000017500000000622212536755447016453 0ustar gggg#!/usr/bin/perl -w # Copyright 2011, 2012, 2013, 2015 Kevin Ryde # 0-file-is-part-of.t is shared by several distributions. # # 0-file-is-part-of.t is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # 0-file-is-part-of.t is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General # Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . require 5; use strict; use Test::More tests => 1; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings(); } ok (Test::FileIsPartOfDist->check(verbose=>1), 'Test::FileIsPartOfDist'); exit 0; package Test::FileIsPartOfDist; BEGIN { require 5 } use strict; use ExtUtils::Manifest; use File::Slurp; # uncomment this to run the ### lines # use Smart::Comments; sub import { my $class = shift; my $arg; foreach $arg (@_) { if ($arg eq '-test') { require Test; Test::plan(tests=>1); is ($class->check, 1, 'Test::FileIsPartOfDist'); } } return 1; } sub new { my $class = shift; return bless { @_ }, $class; } sub check { my $class = shift; my $self = $class->new(@_); my $manifest = ExtUtils::Manifest::maniread(); if (! $manifest) { $self->diag("no MANIFEST perhaps"); return 0; } my @filenames = keys %$manifest; my $distname = $self->makefile_distname; if (! defined $distname) { $self->diag("Oops, DISTNAME not found in Makefile"); return 0; } if ($self->{'verbose'}) { $self->diag("DISTNAME $distname"); } my $good = 1; my $filename; foreach $filename (@filenames) { if (! $self->check_file_is_part_of($filename,$distname)) { $good = 0; } } return $good; } sub makefile_distname { my ($self) = @_; my $filename = "Makefile"; my $content = File::Slurp::read_file ($filename); if (! defined $content) { $self->diag("Cannot read $filename: $!"); return undef; } my $distname; if ($content =~ /^DISTNAME\s*=\s*([^#\n]*)/m) { $distname = $1; $distname =~ s/\s+$//; ### $distname if ($distname eq 'App-Chart') { $distname = 'Chart'; } # hack } return $distname; } sub check_file_is_part_of { my ($self, $filename, $distname) = @_; my $content = File::Slurp::read_file ($filename); if (! defined $content) { $self->diag("Cannot read $filename: $!"); return 0; } $content =~ /([T]his file is part of[^\n]*)/i or return 1; my $got = $1; if ($got =~ /[T]his file is part of \Q$distname\E\b/i) { return 1; } $self->diag("$filename: $got"); $self->diag("expected DISTNAME: $distname"); return 0; } sub diag { my $self = shift; my $func = $self->{'diag_func'} || eval { Test::More->can('diag') } || \&_diag; &$func(@_); } sub _diag { my $msg = join('', map {defined($_)?$_:'[undef]'} @_)."\n"; $msg =~ s/^/# /mg; print STDERR $msg; } HTML-FormatExternal-26/xt/0-Test-DistManifest.t0000755000175000017500000000231111655356331017050 0ustar gggg#!/usr/bin/perl -w # 0-Test-DistManifest.t -- run Test::DistManifest if available # Copyright 2009, 2010, 2011 Kevin Ryde # 0-Test-DistManifest.t is shared by several distributions. # # 0-Test-DistManifest.t is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # 0-Test-DistManifest.t is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General # Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . use 5.004; use strict; use Test::More; # This is only an author test really and it only really does much in a # working directory where newly added files will exist. In a dist dir # something would have to be badly wrong for the manifest to be off. eval { require Test::DistManifest } or plan skip_all => "due to Test::DistManifest not available -- $@"; Test::DistManifest::manifest_ok(); exit 0; HTML-FormatExternal-26/xt/0-examples-xrefs.t0000644000175000017500000000430012230011245016457 0ustar gggg#!/usr/bin/perl -w # Copyright 2011, 2013 Kevin Ryde # 0-examples-xrefs.t is shared by several distributions. # # 0-examples-xrefs.t is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # 0-examples-xrefs.t is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General # Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . BEGIN { require 5 } use strict; use ExtUtils::Manifest; use Test::More; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings(); } my $manifest = ExtUtils::Manifest::maniread(); my @example_files = grep m{examples/.*\.pl$}, keys %$manifest; my @lib_files = grep m{lib/.*\.(pm|pod)$}, keys %$manifest; sub any_file_contains_example { my ($example) = @_; my $filename; foreach $filename (@lib_files) { if (pod_contains_example($filename, $example)) { return 1; } } foreach $filename (@example_files) { if ($filename ne $example && raw_contains_example($filename, $example)) { return 1; } } return 0; } sub pod_contains_example { my ($filename, $example) = @_; open FH, "< $filename" or die "Cannot open $filename: $!"; my $content = do { local $/; }; # slurp close FH or die "Error closing $filename: $!"; return scalar ($content =~ /F<\Q$example\E> |F\s+directory /xs); } sub raw_contains_example { my ($filename, $example) = @_; $example =~ s{^examples/}{}; open FH, "< $filename" or die "Cannot open $filename: $!"; my $ret = scalar (grep /\b\Q$example\E\b/, ); close FH or die "Error closing $filename: $!"; return $ret > 0; } plan tests => scalar(@example_files) + 1; my $example; foreach $example (@example_files) { is (any_file_contains_example($example), 1, "$example mentioned in some lib/ file"); } ok(1); exit 0; HTML-FormatExternal-26/xt/0-Test-YAML-Meta.t0000755000175000017500000000345512347217470016115 0ustar gggg#!/usr/bin/perl -w # 0-Test-YAML-Meta.t -- run Test::CPAN::Meta::YAML if available # Copyright 2009, 2010, 2011, 2013, 2014 Kevin Ryde # 0-Test-YAML-Meta.t is shared by several distributions. # # 0-Test-YAML-Meta.t is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 3, or (at your option) any later # version. # # 0-Test-YAML-Meta.t is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . use 5.004; use strict; use Test::More; my $meta_filename = 'META.yml'; unless (-e $meta_filename) { plan skip_all => "$meta_filename doesn't exist -- assume this is a working directory not a dist"; } plan tests => 3; SKIP: { eval { require CPAN::Meta::Validator; 1 } or skip "due to CPAN::Meta::Validator not available -- $@"; eval { require YAML; 1 } or skip "due to YAML module not available -- $@", 1; diag "CPAN::Meta::Validator version ", CPAN::Meta::Validator->VERSION; my $struct = YAML::LoadFile ($meta_filename); my $cmv = CPAN::Meta::Validator->new($struct); ok ($cmv->is_valid); if (! $cmv->is_valid) { diag "CPAN::Meta::Validator errors:"; foreach ($cmv->errors) { diag $_; } } } { # Test::CPAN::Meta::YAML version 0.15 for upper case "optional_features" names # eval 'use Test::CPAN::Meta::YAML 0.15; 1' or plan skip_all => "due to Test::CPAN::Meta::YAML 0.15 not available -- $@"; Test::CPAN::Meta::YAML::meta_spec_ok('META.yml'); } exit 0; HTML-FormatExternal-26/xt/my-manifest.sh0000755000175000017500000000165211764227757016025 0ustar gggg#!/bin/sh # my-manifest.sh -- update MANIFEST file # Copyright 2009, 2010, 2011, 2012 Kevin Ryde # my-manifest.sh is shared by several distributions. # # my-manifest.sh is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 3, or (at your option) any later # version. # # my-manifest.sh is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . set -e if [ -e MANIFEST ]; then mv MANIFEST MANIFEST.old || true fi touch SIGNATURE ( make manifest 2>&1; diff -u MANIFEST.old MANIFEST ) | ${PAGER:-more} HTML-FormatExternal-26/xt/0-META-read.t0000755000175000017500000001071512136177162015205 0ustar gggg#!/usr/bin/perl -w # 0-META-read.t -- check META.yml can be read by various YAML modules # Copyright 2009, 2010, 2011, 2012, 2013 Kevin Ryde # 0-META-read.t is shared among several distributions. # # 0-META-read.t is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the Free # Software Foundation; either version 3, or (at your option) any later # version. # # 0-META-read.t is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . use 5.005; use strict; use Test::More; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings(); } # When some of META.yml is generated by explicit text in Makefile.PL it can # be easy to make a mistake in the syntax, or indentation, etc, so the idea # here is to check it's readable from some of the YAML readers. # # The various readers differ in how strictly they look at the syntax. # There's no attempt here to say one of them is best or tightest or # whatever, just see that they all work. # # See 0-Test-YAML-Meta.t for Test::YAML::Meta which looks into field # contents, as well as maybe the YAML formatting. my $meta_filename; # allow for ancient perl, maybe eval { require FindBin; 1 } # new in 5.004 or plan skip_all => "FindBin not available -- $@"; eval { require File::Spec; 1 } # new in 5.005 or plan skip_all => "File::Spec not available -- $@"; diag "FindBin $FindBin::Bin"; $meta_filename = File::Spec->catfile ($FindBin::Bin, File::Spec->updir, 'META.yml'); -e $meta_filename or plan skip_all => "$meta_filename doesn't exist -- assume this is a working directory not a dist"; plan tests => 5; SKIP: { eval { require YAML; 1 } or skip "due to YAML module not available -- $@", 1; my $ok = eval { YAML::LoadFile ($meta_filename); 1 } or diag "YAML::LoadFile() error -- $@"; ok ($ok, "Read $meta_filename with YAML module"); } # YAML 0.68 is in fact YAML::Old, or something weird -- don't think they can # load together # # SKIP: { # eval { require YAML::Old; 1 } # or skip 'due to YAML::Old not available -- $@', 1; # # eval { YAML::Old::LoadFile ($meta_filename) }; # is ($@, '', # "Read $meta_filename with YAML::Old"); # } SKIP: { eval { require YAML::Syck; 1 } or skip "due to YAML::Syck not available -- $@", 1; my $ok = eval { YAML::Syck::LoadFile ($meta_filename); 1 } or diag "YAML::Syck::LoadFile() error -- $@"; ok ($ok, "Read $meta_filename with YAML::Syck"); } SKIP: { eval { require YAML::Tiny; 1 } or skip "due to YAML::Tiny not available -- $@", 1; my $ok = eval { YAML::Tiny->read ($meta_filename); 1 } or diag "YAML::Tiny->read() error -- $@"; ok ($ok, "Read $meta_filename with YAML::Tiny"); } SKIP: { eval { require YAML::XS; 1 } or skip "due to YAML::XS not available -- $@", 1; my $ok = eval { YAML::XS::LoadFile ($meta_filename); 1 } or diag "YAML::XS::LoadFile() error -- $@"; ok ($ok, "Read $meta_filename with YAML::XS"); } # Parse::CPAN::Meta describes itself for use on "typical" META.yml, so not # sure if demanding it works will more exercise its subset of yaml than the # correctness of our META.yml. At any rate might like to know if it fails, # so as to avoid tricky yaml for everyone's benefit, maybe. # SKIP: { eval { require Parse::CPAN::Meta; 1 } or skip "due to Parse::CPAN::Meta not available -- $@", 1; my $ok = eval { Parse::CPAN::Meta::LoadFile ($meta_filename); 1 } or diag "Parse::CPAN::Meta::LoadFile() error -- $@"; ok ($ok, "Read $meta_filename with Parse::CPAN::Meta::LoadFile"); } # Data::YAML::Reader 0.06 doesn't like header "--- #YAML:1.0" with the # # part produced by other YAML writers, so skip for now # # SKIP: { # eval { require Data::YAML::Reader; 1 } # or skip 'due to Data::YAML::Reader not available -- $@', 1; # # my $reader = Data::YAML::Reader->new; # open my $fh, '<', $meta_filename # or die "Cannot open $meta_filename"; # my $str = do { local $/=undef; <$fh> }; # close $fh or die; # # # if ($str !~ /\.\.\.$/) { # # $str .= "..."; # # } # my @lines = split /\n/, $str; # push @lines, "..."; # use Data::Dumper; # print Dumper(\@lines); # # # { local $,="\n"; print @lines,"\n"; } exit 0; HTML-FormatExternal-26/xt/0-Test-Pod.t0000755000175000017500000000175111655356337015215 0ustar gggg#!/usr/bin/perl -w # 0-Test-Pod.t -- run Test::Pod if available # Copyright 2009, 2010, 2011 Kevin Ryde # 0-Test-Pod.t is shared by several distributions. # # 0-Test-Pod.t is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the Free # Software Foundation; either version 3, or (at your option) any later # version. # # 0-Test-Pod.t is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . use 5.004; use strict; use Test::More; # all_pod_files_ok() is new in Test::Pod 1.00 # eval 'use Test::Pod 1.00; 1' or plan skip_all => "due to Test::Pod 1.00 not available -- $@"; Test::Pod::all_pod_files_ok(); exit 0; HTML-FormatExternal-26/Makefile.PL0000755000175000017500000000477612516110234014542 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use 5.006; use strict; use warnings; use ExtUtils::MakeMaker; my %PREREQ_PM = ( 'constant::defer' => 0, 'IPC::Run' => 0, 'URI::file' => 0.08, # version 0.08 for new_abs() # Version 0.18 for overloaded eq() which File::Copy # calls :-(. Only actually needed for the "base" option # with input from a file. 'File::Temp' => 0.18, # Version 0.80 for File::Spec->devnull(), which came # with Perl 5.6.0 already in fact. 'File::Spec' => 0.80, ); my %TEST_REQUIRES = ( # for the t/*.t tests 'Test::More' => 0, ); unless (eval { ExtUtils::MakeMaker->VERSION(6.64) }) { # fallback if ExtUtils::MakeMaker doesn't know TEST_REQUIRES %PREREQ_PM = (%PREREQ_PM, %TEST_REQUIRES); } WriteMakefile (NAME => 'HTML::FormatExternal', ABSTRACT => 'HTML to text formatting using external programs.', VERSION_FROM => 'lib/HTML/FormatExternal.pm', MIN_PERL_VERSION => '5.006', PREREQ_PM => \%PREREQ_PM, TEST_REQUIRES => \%TEST_REQUIRES, AUTHOR => 'Kevin Ryde ', LICENSE => 'gpl_3', SIGN => 1, META_MERGE => { 'meta-spec' => { version => 2 }, no_index => { directory=>['devel','xt'] }, resources => { homepage => 'http://user42.tuxfamily.org/html-formatexternal/index.html', license => 'http://www.gnu.org/licenses/gpl.html', }, prereqs => { test => { suggests => { 'HTML::TreeBuilder' => 0, 'Taint::Util' => 0, }, }, }, }, ); HTML-FormatExternal-26/examples/0002755000175000017500000000000012570233261014375 5ustar ggggHTML-FormatExternal-26/examples/demo.pl0000755000175000017500000000232312153223307015654 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2010, 2013 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use HTML::FormatText::Lynx; my $html = <<'HERE'; A Page

Hello this is some sample html input, with a link to your local host's toplevel index file.

HERE my $str = HTML::FormatText::Lynx->format_string ($html, leftmargin => 5, rightmargin => 40); print $str; exit 0; HTML-FormatExternal-26/inc/0002755000175000017500000000000012570233261013330 5ustar ggggHTML-FormatExternal-26/inc/my_pod2html0000755000175000017500000000750712175610062015522 0ustar gggg#!/usr/bin/perl # my_pod2html -- convert POD to HTML, with some mangling # Copyright 2009, 2010, 2011, 2013 Kevin Ryde # my_pod2html is shared by several distributions. # # my_pod2html is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by the Free # Software Foundation; either version 3, or (at your option) any later # version. # # my_pod2html is distributed in the hope that it will be useful, but WITHOUT # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for # more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . use strict; use warnings; #use Smart::Comments; my $pod2html = MyPod2HTML->new; #

and

both too big, in mozilla at least $pod2html->html_h_level(3); $pod2html->parse_from_file(@ARGV); exit 0; package MyPod2HTML; use base 'Pod::Simple::HTML'; our $VERSION = 1; use constant DEBUG => 0; my %table = ('apt-file' => 'http://packages.debian.org/apt-file', 'apt-cache' => 'http://packages.debian.org/apt', 'apt-rdepends' => 'http://packages.debian.org/apt-rdepends', 'gtk-options' => 'http://manpages.ubuntu.com/manpages/jaunty/man7/gtk-options.7.html', 'xsetroot' => 'http://www.x.org/archive/X1'.'1R7.5/doc/man/man1/xsetroot.1.html', 'leafnode' => 'http://leafnode.sourceforge.net', 'lynx' => 'http://lynx.isc.org/', 'feed2imap' => 'http://home.gna.org/feed2imap', # disguise from grep 'rss'.'2email' => 'http://rss'.'2email.infogami.com', 'rssdrop' => 'http://search.cpan.org/dist/rssdrop/', 'toursst' => 'http://packages.debian.org/etch/toursst', 'netrc' => 'http://linux.die.net/man/5/netrc', # no online man pages apparently at http://man-db.nongnu.org/ 'man' => 'http://ftp.parisc-linux.org/cgi-bin/man/man2html?man+1', 'lexgrog' => 'http://ftp.parisc-linux.org/cgi-bin/man/man2html?lexgrog+1', 'apropos' => 'http://ftp.parisc-linux.org/cgi-bin/man/man2html?apropos+1', ); sub resolve_man_page_link { my ($self, $to, $frag) = @_; $to = "$to"; # Pod::Simple::LinkSection object ### $to if (my ($page, $section) = ($to =~ /(.*)\(\d+\)$/)) { ### $page if (my $url = $table{$page}) { return $url; } } return shift->SUPER::resolve_man_page_link (@_); } sub resolve_pod_link_by_table { my ($self, $to, $section) = @_; my $url; if (defined $to) { if ($to eq 'AptPkg') { $url = 'http://packages.debian.org/libapt-pkg-perl'; } if ($to =~ /^Glib::Ex::(SourceIds|SignalIds|FreezeNotify|TieProperties)/) { $url = "http://user42.tuxfamily.org/glib-ex-ob"."jectbits/$1.html"; } if ($to eq 'Gtk2::Ex::Widget'.'Cursor') { $url = 'http://user42.tuxfamily.org/gtk2-ex-widget'.'cursor/Widget'.'Cursor.html'; } if ($to eq 'Ti'.'e:'.':TZ') { $url = 'http://user42.tuxfamily.org/ti'.'e-'.'tz/TZ.html'; } if ($to eq 'Time:'.':TZ') { $url = 'http://user42.tuxfamily.org/ti'.'e-'.'tz/Time-'.'TZ.html'; } if ($to =~ /^(Glib|Gtk2)($|::(?!Ex::))/) { $to =~ s{::}{/}; $url = "http://gtk2-perl.sourceforge.net/doc/pod/$to.html" } if (defined $url) { return ($url . (defined $section && $section ne '' ? "#$section" : '')); } } return $self->SUPER::resolve_pod_link_by_table($to, $section); } # sub do_pod_link { # my($self, $link) = @_; # if (DEBUG) { # print "\nlink tag=",$link->tagname," type=",$link->attr('type'),"\n"; # print " to=",$link->attr('to')||'[none]',"\n"; # print " section=",$link->attr('section')||'[none]',"\n"; # } # # my $to = $link->attr('to') || ''; # undef if internal link # # return $self->SUPER::do_pod_link($link); # } HTML-FormatExternal-26/devel/0002755000175000017500000000000012570233261013656 5ustar ggggHTML-FormatExternal-26/devel/element-format.pl0000644000175000017500000000545112103121371017123 0ustar gggg#!/usr/bin/perl -w # Copyright 2013 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Module::Load; use HTML::TreeBuilder; use Data::Dumper; $Data::Dumper::Useqq = 1; use FindBin qw($Bin); my $class; $class = 'HTML::FormatText::WithLinks'; $class = 'HTML::FormatText::WithLinks::AndTables'; $class = 'HTML::FormatText::W3m'; $class = 'HTML::FormatText'; $class = 'HTML::FormatText::Netrik'; $class = 'HTML::FormatText::Links'; $class = 'HTML::FormatText::Html2text'; $class = 'HTML::FormatText::Elinks'; $class = 'HTML::FormatText::Lynx'; Module::Load::load ($class); # { # output_charset => 'ascii', # output_charset => 'ANSI_X3.4-1968', # output_charset => 'utf-8' my $output_charset = 'utf-8'; # input_charset => 'shift-jis', # input_charset => 'iso-8859-1', # input_charset => 'utf-8', my $input_charset; $input_charset = 'utf16le'; $input_charset = 'ascii'; my $formatter = $class->new ( input_charset => $input_charset, output_charset => $output_charset, rightmargin => 60, # leftmargin => 20, justify => 1, base => "http://foo.org/\x{2022}/foo.html", # lynx_options => [ '-underscore', # '-underline_links', # '-with_backspaces', # ], justify => 1, ); # { # my $filename = "$FindBin::Bin/base.html"; # # my $filename = "/tmp/rsquo.html"; # my $tree = HTML::TreeBuilder->new; # $tree->parse_file($filename); # } my $tree = HTML::Element->new('a', href => 'http://www.perl.com/'); $tree->push_content("The Perl Homepage"); print "$tree\n"; print $tree->as_HTML; my $str = $tree->format($formatter); $Data::Dumper::Purity = 1; print $str; print Data::Dumper->new([\$str],['output'])->Useqq(0)->Dump; print "utf8 flag ",(utf8::is_utf8($str) ? 'yes' : 'no'), "\n"; exit 0; } HTML-FormatExternal-26/devel/wide.pl0000644000175000017500000000170512510414274015143 0ustar gggg#!/usr/bin/perl -w # Copyright 2015 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Data::Dumper; use HTML::FormatText; use charnames ':full'; $Data::Dumper::Useqq=1; my $html = "\x{263A}"; my $formatter = HTML::FormatText->new; my $str = $formatter->format_string($html); print Dumper(\$str); exit 0; HTML-FormatExternal-26/devel/base-utf16.html0000644000175000017500000000327011343641066016424 0ustar ggggÿþ<!-- -*- coding: utf-16le-with-signature-unix -*- Copyright 2010 Kevin Ryde HTML-FormatExternal is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. HTML-FormatExternal is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with HTML-FormatExternal. If not, see <http://www.gnu.org/licenses/>. --> <html> <head> </head> <body> <a href="page.html">Foo</a> <a href="http://absolute.org/anotherpage.html">Bar</a> </body> </html> HTML-FormatExternal-26/devel/base.html0000644000175000017500000000154412516110005015446 0ustar gggg

Foo Bar

underline HTML-FormatExternal-26/devel/utf.pl0000644000175000017500000000221211424406727015012 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2009, 2010 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Module::Load; use Data::Dumper; use Encode; use charnames ':full'; $Data::Dumper::Useqq=1; foreach my $charset ('utf-16le','utf-16be','utf-32le','utf-32be') { foreach my $str ("\N{BYTE ORDER MARK}", 'Foo', 'http://foo.org/page.html') { my $bytes = Encode::encode($charset, $str); print Dumper(\$bytes); } print "\n"; } exit 0; HTML-FormatExternal-26/devel/margin12.html0000644000175000017500000000150612153475752016176 0ustar gggg This is the title 123 567 9012 abc def ghij
0 2 4 6 8 A C E
01 3 5 7 9 B D F
degrees: ° HTML-FormatExternal-26/devel/output-charset.pl0000644000175000017500000000304312327645473017214 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2009, 2010, 2014 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Module::Load; foreach my $class ('HTML::FormatText::Elinks', 'HTML::FormatText::Html2text', 'HTML::FormatText::Lynx', 'HTML::FormatText::Links', 'HTML::FormatText::Netrik', 'HTML::FormatText::W3m', 'HTML::FormatText::Vilistextum', ) { Module::Load::load($class); my $name = $class; $name =~ s/.*:://; my $html_string = '

ö

'; my $text = $class->format_string ($html_string); printf "%-12s ", $name; if (! defined $text) { print "undef\n"; next; } $text =~ s/\n//g; $text =~ s/ //g; foreach my $i (0 .. length($text)-1) { print ord(substr($text,$i,1))," "; } if (length($text) == 0) { print "empty"; } print "\n"; } HTML-FormatExternal-26/devel/run.pl0000644000175000017500000003261312570226300015016 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Module::Load; use Data::Dumper; $Data::Dumper::Useqq = 1; use FindBin qw($Bin); # uncomment this to run the ### lines use Smart::Comments; my $class; $class = 'HTML::FormatText::WithLinks'; $class = 'HTML::FormatText::WithLinks::AndTables'; $class = 'HTML::FormatText'; $class = 'HTML::FormatText::Netrik'; $class = 'HTML::FormatText::Elinks'; $class = 'HTML::FormatText::Html2text'; $class = 'HTML::FormatText::Lynx'; $class = 'HTML::FormatText::Links'; $class = 'HTML::FormatText::W3m'; $class = 'HTML::FormatText::Vilistextum'; Module::Load::load ($class); # { foreach my $class ('File::Spec::Cygwin', 'File::Spec::Epoc', 'File::Spec::Mac', 'File::Spec::OS2', 'File::Spec::Unix', 'File::Spec::VMS', 'File::Spec::Win32', ) { if (! eval "require $class; 1") { print "$@\n"; next; } my $filename = 'C:FOO'; my $is_absolute = $class->file_name_is_absolute($filename) ? 1 : 0; my ($volume,$directories,$file) = $class->splitpath($filename); my $colon_is_ordinary = ($volume eq '' && $directories eq '' && ! $class->file_name_is_absolute($filename) ? 1 : 0); ### $class ### $is_absolute ### $volume ### $directories ### $file ### $colon_is_ordinary # my ($volume,$directories,$file) = File::Spec::Win32->splitpath('h:/x/y/foo'); } exit 0; } { my $input_filename = '/tmp/foo/http:'; require URI::file; my $str = URI::file->new_abs($input_filename)->as_string; print $str,"\n"; exit 0; } { # HTML::Tree require HTML::Element; my $a = HTML::Element->new('a', href=>'blah="=blah=\'='); $a->push_content("Hello \x{263A} world"); my $p = HTML::Element->new('p'); $p->push_content("Hello \x{263A} world"); $p->push_content($a); my $body = HTML::Element->new('body'); $body->insert_element($p); my $html = HTML::Element->new('body'); $html->insert_element($body); my $html_str = $html->as_HTML( '<>&' ); $Data::Dumper::Useqq=1; print Dumper(\$html_str); print "utf8 flag ",(utf8::is_utf8($html_str) ? 'yes' : 'no'), "\n"; print $html->as_HTML; require HTML::FormatText::Vilistextum; my $formatter = HTML::FormatText::Vilistextum->new; my $str = $formatter->format ($html); print Dumper(\$str); print "utf8 flag ",(utf8::is_utf8($str) ? 'yes' : 'no'), "\n"; require Scalar::Util; print "tainted ",(Scalar::Util::tainted($str) ? 'yes' : 'no'), "\n"; exit 0; } { # format_string() with wide chars # $ENV{PATH} = '/bin:/usr/bin'; my $html = "

Hello \x{263A} \x{2641} world A ÿ blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah

\n"; require HTML::FormatText::Zen; my $str = HTML::FormatText::Zen->format_string ($html, # output_wide => 'as_input', leftmargin => 10, ); $Data::Dumper::Useqq=1; print Dumper(\$str); print "utf8 flag ",(utf8::is_utf8($str) ? 'yes' : 'no'), "\n"; require Scalar::Util; print "tainted ",(Scalar::Util::tainted($str) ? 'yes' : 'no'), "\n"; print $str; exit 0; } { # duplicated links formatting foreach my $class ('HTML::FormatText::Netrik', 'HTML::FormatText::Links', 'HTML::FormatText::Html2text', 'HTML::FormatText::Lynx', 'HTML::FormatText::Elinks', 'HTML::FormatText::W3m', 'HTML::FormatText::Vilistextum', ) { print "\n$class\n"; Module::Load::load ($class); my $html = "

One

Two

underline

\n"; my $str = $class->format_string ($html, output_wide => 'as_input', # lynx_options => ['-underscore'], unique_links => 1, ); print $str; } exit 0; } { # program_version() of each module foreach my $class ('HTML::FormatText::Netrik', 'HTML::FormatText::Links', 'HTML::FormatText::Html2text', 'HTML::FormatText::Lynx', 'HTML::FormatText::Elinks', 'HTML::FormatText::W3m', 'HTML::FormatText::Vilistextum', ) { Module::Load::load ($class); my $version = $class->program_version; my $full = $class->program_full_version; ### $class ### $full ### $version } exit 0; } { # $ENV{PATH} = '/bin:/usr/bin'; require HTML::FormatText::Lynx; print "Lynx _have_nomargins(): ", (HTML::FormatText::Lynx->_have_nomargins() ? "yes" : "no"),"\n"; require HTML::FormatText::Links; print "Links _have_html_margin(): ", (HTML::FormatText::Links->_have_html_margin() ? "yes" : "no"),"\n"; require HTML::FormatText::Vilistextum; print "Vilistextum _have_multibyte(): ", (HTML::FormatText::Vilistextum->_have_multibyte() ? "yes" : "no"),"\n"; exit 0; } { # IPC::Run in taint mode # $ENV{PATH} = '/bin:/usr/bin'; my $str; require IPC::Run; IPC::Run::run(['echo','hello'], '>',\$str); # IPC::Run::run(['cat'], '<', \'hello', '>', \$str); ### $str exit 0; } { # taintedness of program_version() $ENV{PATH} = '/bin:/usr/bin'; require HTML::FormatText::W3m; my $str = HTML::FormatText::W3m->program_full_version; require Scalar::Util; print "tainted ",(Scalar::Util::tainted($str) ? 'yes' : 'no'), "\n"; exit 0; } { # format_file() with output_wide require HTML::FormatText::W3m; my $str = HTML::FormatText::W3m->format_file ('devel/base.html', output_wide => 1); $Data::Dumper::Useqq=1; print Dumper(\$str); print "utf8 flag ",(utf8::is_utf8($str) ? 'yes' : 'no'), "\n"; exit 0; } { # format_file() with base require HTML::FormatText::Elinks; my $str = HTML::FormatText::Elinks->format_file ('devel/base.html', base => 'http://localhost'); exit 0; } { # BOM on input # lynx recognises automatically my $html = "

Hello world

\n"; require Encode; $html = Encode::encode('utf-32',$html); # with BOM # $html = "\xFF\xFE\x00\x00" . Encode::encode('utf-32le',$html); # with BOM $html = ("\x20\x00\x00\x00" x 8) . $html; # BE spaces print "HTML input string:\n"; IPC::Run::run(['hd'],'<',\$html, '>','/tmp/hd.txt'); IPC::Run::run(['cat'],'<','/tmp/hd.txt'); require HTML::FormatText::Lynx; my $text = HTML::FormatText::Lynx->format_string ($html, input_charset=>'UTF-32', # output_charset=>'UTF-8', output_wide => 1, # base => 'http://localhost', ); print "Text output:\n"; print $text; IPC::Run::run(['hd'],'<',\$text, '>','/tmp/hd.txt'); IPC::Run::run(['cat'],'<','/tmp/hd.txt'); for my $i (0 .. length($text)-1) { my $c = substr($text,$i,1); if (ord($c) >= 128) { printf "0x%X\n", ord($c); } } exit 0; } { # entities POSIX::setlocale (POSIX::LC_CTYPE(), "C"); foreach my $class ('HTML::FormatText::Netrik', 'HTML::FormatText::Links', 'HTML::FormatText::Html2text', 'HTML::FormatText::Lynx', 'HTML::FormatText::Elinks', 'HTML::FormatText::W3m', 'HTML::FormatText::Vilistextum', ) { print "--------------------\n$class\n"; Module::Load::load ($class); my $html = "

\xA2 ☺ ◖

"; my $str = $class->format_string ($html # input_charset => $input_charset, # output_charset => $output_charset, ); print $str; $Data::Dumper::Useqq=1; print Dumper(\$str); print "utf8 flag ",(utf8::is_utf8($str) ? 'yes' : 'no'), "\n"; } exit 0; } { # my $filename = "$FindBin::Bin/x.html"; # $filename = "/tmp/z.html"; # my $filename = "$FindBin::Bin/base.html"; # my $filename = "$FindBin::Bin/margin12.html"; my $filename = "t/%57"; # my $filename = "/tmp/rsquo.html"; # output_charset => 'ascii', # output_charset => 'ANSI_X3.4-1968', # output_charset => 'utf-8' my $output_charset = 'utf-8'; # input_charset => 'shift-jis', # input_charset => 'iso-8859-1', # input_charset => 'utf-8', my $input_charset; $input_charset = 'utf16le'; $input_charset = 'ascii'; $input_charset = 'latin-1'; require File::Copy; print "File::Copy ",File::Copy->VERSION, "\n"; my $str = $class->format_file ($filename, # rightmargin => 12, # # leftmargin => 20, # justify => 1, # # base => "http://foo.org/\x{2022}/foo.html", # input_charset => $input_charset, output_charset => $output_charset, # # lynx_options => [ '-underscore', # # '-underline_links', # # '-with_backspaces', # # ], # justify => 1, ); $Data::Dumper::Purity = 1; print "$class on $filename\n"; print $str; print Data::Dumper->new([\$str],['output'])->Useqq(0)->Dump; print "utf8 flag ",(utf8::is_utf8($str) ? 'yes' : 'no'), "\n"; exit 0; } { require I18N::Langinfo; require POSIX; POSIX::setlocale (POSIX::LC_CTYPE(), "C"); my $charset = I18N::Langinfo::langinfo (I18N::Langinfo::CODESET()); print "charset $charset\n"; exit 0; } { foreach my $class (qw(HTML::FormatText::Elinks HTML::FormatText::Html2text HTML::FormatText::Lynx HTML::FormatText::Links HTML::FormatText::Netrik HTML::FormatText::W3m HTML::FormatText::Zen)) { system "perl", "-Mblib", "-M$class", "-e", "print 'ok $class\n'"; } exit 0; } { require HTML::FormatText::Lynx; print "Lynx ", HTML::FormatText::Lynx->program_version, " _have_nomargins ", (HTML::FormatText::Lynx->_have_nomargins?"yes":"no"),"\n"; require HTML::FormatText::Html2text; print "Html2text ", HTML::FormatText::Html2text->program_version, " _have_ascii ", (HTML::FormatText::Html2text->_have_ascii?"yes":"no"),"\n"; require HTML::FormatText::Links; print "Links ", HTML::FormatText::Links->program_version, " _have_html_margin ", (HTML::FormatText::Links->_have_html_margin?"yes":"no"),"\n"; exit 0; } { my $html_str = <<"HERE"; A Page

Hello fjkd jfksd jfk \x{263A} sdjkf jsk fjsdk fjskd jfksd jfks djfk sdjfk sdjkf jsdkf jsdk fjksd fjksd jfksd jfksd jfk sdjfk sdjkf sdjkf sdjkbhjhh world

\x{263A}\x{263A}\x{263A}\x{263A} \x{263A}\x{263A}\x{263A} \x{263A}\x{263A}\x{263A} \x{263A}\x{263A}\x{263A} \x{263A}\x{263A}\x{263A} \x{263A}\x{263A}\x{263A} \x{263A}\x{263A}\x{263A} \x{263A}\x{263A}\x{263A}

HERE print "utf8 flag ",(utf8::is_utf8($html_str) ? 'yes' : 'no'), "\n"; my $str = $class->format_string ($html_str, # justify => 1, rightmargin => 40, leftmargin => 10, ); print $str; print Dumper($str); print "utf8 flag ",(utf8::is_utf8($str) ? 'yes' : 'no'), "\n"; exit 0; } { my $str = $class->format_string ('

Hello '); print $str; exit 0; } # if ($class !~ /Lynx/) { # # old lynx, eg. 2.8.1, doesn't have -display_charset for output_charset # my $help = $class->_run_version ('lynx', '-help'); # my $have_display_charset = (defined $help # && $help =~ /-display_charset/); # if ($charset ne 'ascii' && ! $have_display_charset) { # skip "this lynx doesn't have -display_charset", 2; # } # } HTML-FormatExternal-26/devel/slurp.pl0000644000175000017500000000507212153222650015357 0ustar gggg#!/usr/bin/perl -w # Copyright 2012, 2013 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Data::Dumper; # uncomment this to run the ### lines use Smart::Comments; { require IPC::Run; my $infile = '/dev/null'; my $stdout; my $stderr; my $ret = eval { IPC::Run::run(['lynx','-dump'], '<', $infile, '>', \$stdout, '2>', \$stderr) }; my $err = $@; ### $stdout ### $stderr ### $ret ### $err exit 0; } { require Perl6::Slurp; my $str = eval { Perl6::Slurp::slurp ('-|', 'nosuchprogramname') }; print $str//'undef'; exit 0; } { my $in; if (! open $in, '-|', 'nosuchprogram', '--version') { my $e1 = $@; my $e2 = $!; print Dumper($e1); print Dumper($e2); } undef $in; local $SIG{__WARN__} = sub { $_[0] =~ /Can't exec/ or warn $_[0] }; if (! open $in, '-|', 'nosuchprogram', '--version') { my $e1 = $@; my $e2 = $!; print Dumper($e1); print Dumper($e2); } exit 0; } #------------------------------------------------------------------------------ # Old code: # # # In Perl6::Slurp version 0.03 open() gives its usual warning if it can't # # run the program, but Perl6::Slurp then croaks with that same message. # # Suppress the warning in the interests of avoiding duplication. # # # sub _slurp_nowarn { # require Perl6::Slurp; # # no warning suppression when debugging # local $SIG{__WARN__} = (DEBUG ? $SIG{__WARN__} : \&_warn_suppress_exec); # return Perl6::Slurp::slurp (@_); # } # sub _warn_suppress_exec { # $_[0] =~ /Can't exec/ or warn $_[0]; # } # '-|', # require Perl6::Slurp; # my $str = do { # local %ENV = %ENV; # @ENV{keys %$env} = values %$env; # overrides out of subclasses # Perl6::Slurp::slurp (@command); # }; # Perl6::Slurp demands 5.8 anyway, don't think need to ask for 5.8 here to # be sure of getting multi-arg open() of piped command in that module # use 5.008; HTML-FormatExternal-26/lib/0002755000175000017500000000000012570233261013325 5ustar ggggHTML-FormatExternal-26/lib/HTML/0002755000175000017500000000000012570233261014071 5ustar ggggHTML-FormatExternal-26/lib/HTML/FormatExternal.pm0000644000175000017500000005240512560646141017372 0ustar gggg# Copyright 2008, 2009, 2010, 2011, 2012, 2013, 2015 Kevin Ryde # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . # Maybe: # capture error output # errors_to => \$var # combine error messages # package HTML::FormatExternal; use 5.006; use strict; use warnings; use Carp; use File::Spec 0.80; # version 0.80 of perl 5.6.0 or thereabouts for devnull() use IPC::Run; # uncomment this to run the ### lines # use Smart::Comments; our $VERSION = 26; sub new { my ($class, %self) = @_; return bless \%self, $class; } sub format { my ($self, $html) = @_; if (ref $html) { $html = $html->as_HTML; } return $self->format_string ($html, %$self); } use constant _WIDE_INPUT_CHARSET => 'UTF-8'; use constant _WIDE_OUTPUT_CHARSET => 'UTF-8'; # format_string() takes the easy approach of putting the string in a temp # file and letting format_file() do the real work. The formatter programs # can generally read stdin and write stdout, so might do that with select() # to simultaneously write and read back. # sub format_string { my ($class, $html_str, %options) = @_; my $fh = _tempfile(); my $input_wide = eval { utf8::is_utf8($html_str) }; _output_wide(\%options, $input_wide); # insert while in wide chars if (defined $options{'base'}) { $html_str = _base_prefix(\%options, $html_str, $input_wide); } if ($input_wide) { if (! $options{'input_charset'}) { $options{'input_charset'} = $class->_WIDE_INPUT_CHARSET; } ### input_charset for wide: $options{'input_charset'} if ($options{'input_charset'} eq 'entitize') { $html_str = _entitize($html_str); delete $options{'input_charset'}; } else { my $layer = ":encoding($options{'input_charset'})"; binmode ($fh, $layer) or die 'Cannot add layer ',$layer; } } do { print $fh $html_str and close($fh) } || die 'Cannot write temp file: ',$!; return $class->format_file ($fh->filename, %options); } # Left margin is synthesized by adding spaces afterwards because the various # programs have pretty variable support for a specified margin. # * w3m doesn't seem to have a left margin option at all # * lynx has one but it's too well hidden in its style sheet or something # * elinks has document.browse.margin_width but it's limited to 8 or so # * netrik doesn't seem to have one at all # * vilistextum has a "spaces" internally for lists etc but no apparent # way to initialize from the command line # sub format_file { my ($class, $filename, %options) = @_; # If neither leftmargin nor rightmargin are specified then '_width' is # unset and the _make_run() funcs leave it to the program defaults. # # If either leftmargin or rightmargin are set then '_width' is established # and the _make_run() funcs use it and and zero left margin, then the # actual left margin is applied below. # # The DEFAULT_LEFTMARGIN and DEFAULT_RIGHTMARGIN establish the defaults # when just one of the two is set. Not good hard coding those values, # but the programs don't have anything to set one but not the other. # my $leftmargin = $options{'leftmargin'}; my $rightmargin = $options{'rightmargin'}; if (defined $leftmargin || defined $rightmargin) { if (! defined $leftmargin) { $leftmargin = $class->DEFAULT_LEFTMARGIN; } if (! defined $rightmargin) { $rightmargin = $class->DEFAULT_RIGHTMARGIN; } $options{'_width'} = $rightmargin - $leftmargin; } _output_wide(\%options, 0); # file input is reckoned as not wide if ($options{'output_wide'}) { $options{'output_charset'} ||= $class->_WIDE_OUTPUT_CHARSET; } my $tempfh; if (defined $options{'base'}) { # insert by copying to a temp file # File::Copy rudely calls eq() to compare $from and $to. Need either # File::Temp 0.18 to have that work on $tempfh, or File::Copy 2.??? for # it to check an overload method exists first. Newer File::Temp is # available from cpan, where File::Copy may not be, so ask for # File::Temp 0.18. require File::Temp; File::Temp->VERSION(0.18); # must sysread()/syswrite() because that's what File::Copy does (as of # its version 2.30) so anything held in the perl buffering by the normal # read() is lost. my $initial; my $fh; do { open $fh, '<', $filename and binmode $fh and defined (sysread $fh, $initial, 4) } || croak "Cannot open $filename: $!"; ### $initial $initial = _base_prefix(\%options, $initial, 0); $tempfh = _tempfile(); $tempfh->autoflush(1); require File::Copy; do { defined(syswrite($tempfh, $initial)) and File::Copy::copy($fh, $tempfh) and close $tempfh and close $fh } || croak "Cannot copy $filename to temp file: $!"; $filename = $tempfh->filename; } # # dump the file being crunched # print "Bytes passed to program:\n"; # IPC::Run::run(['hd'], '<',$filename, '|',['cat']); # _make_run() can set $options{'ENV'} too my ($command_aref, @run) = $class->_make_run($filename, \%options); my $env = $options{'ENV'} || {}; ### $command_aref ### @run ### $env if (! @run) { push @run, '<', File::Spec->devnull; } my $str; { local %ENV = (%ENV, %$env); # overrides from _make_command() eval { IPC::Run::run($command_aref, @run, '>', \$str, # FIXME: what to do with stderr ? # '2>', File::Spec->devnull, ) }; } _die_on_insecure(); ### $str ### final output_wide: $options{'output_wide'} if ($options{'output_wide'}) { require Encode; $str = Encode::decode ($options{'output_charset'}, $str); } if ($leftmargin) { my $fill = ' ' x $leftmargin; $str =~ s/^(.)/$fill$1/mg; # non-empty lines only } return $str; } # most program running errors are quietly ignored for now, but re-throw # "Insecure $ENV{PATH}" when cannot run due to taintedness. sub _die_on_insecure { if ($@ =~ /^Insecure/) { die $@; } } sub _run_version { my ($self_or_class, $command_aref, @ipc_options) = @_; ### _run_version() ... ### $command_aref ### @ipc_options if (! @ipc_options) { @ipc_options = ('2>', File::Spec->devnull); } my $version; # left undef if any exec/slurp problem eval { IPC::Run::run($command_aref, '<', File::Spec->devnull, '>', \$version, @ipc_options) }; # strip blank lines at end of lynx, maybe others if (defined $version) { $version =~ s/\n+$/\n/s; } return $version; } # return a File::Temp filehandle object sub _tempfile { require File::Temp; my $fh = File::Temp->new (TEMPLATE => 'HTML-FormatExternal-XXXXXX', SUFFIX => '.html', TMPDIR => 1); binmode($fh) or die 'Oops, cannot set binmode() on temp file'; ### tempfile: $fh->filename # $fh->unlink_on_destroy(0); # to preserve for debugging ... return $fh; } sub _output_wide { my ($options, $input_wide) = @_; if (! defined $options->{'output_wide'} || $options->{'output_wide'} eq 'as_input') { $options->{'output_wide'} = $input_wide; } } # $str is HTML or some initial bytes. # Return a new string with at the start. # sub _base_prefix { my ($options, $str, $input_wide) = @_; my $base = delete $options->{'base'}; ### _base_prefix: $base $base = "$base"; # stringize possible URI object $base = _entitize($base); # probably shouldn't be any non-ascii in a url $base = "\n"; my $pos = 0; unless ($input_wide) { # encode $base in the input_charset, and possibly after a BOM. # # Lynx recognises a BOM, if it doesn't have other -assume_charset. It # recognises it only at the start of the file, so must insert # after it here to preserve that feature of Lynx. # # If input_charset is utf-32 or utf-16 then it seems reasonable to step # over any BOM. But Lynx for some reason doesn't like a BOM together # with utf-32 or utf-16 specified. Dunno if that's a bug or a feature # on its part. my $input_charset = $options->{'input_charset'}; if (! defined $input_charset || lc($input_charset) eq 'utf-32') { if ($str =~ /^\000\000\376\377/) { $input_charset = 'utf-32be'; $pos = 4; } elsif ($str =~ /^\377\376\000\000/) { $input_charset = 'utf-32le'; $pos = 4; } } if (! defined $input_charset || lc($input_charset) eq 'utf-16') { if ($str =~ /^\376\377/) { $input_charset = 'utf-16be'; $pos = 4; } elsif ($str =~ /^\377\376/) { $input_charset = 'utf-16le'; $pos = 2; } } if (defined $input_charset) { # encode() errors out if unknown charset, and doesn't exist for older # Perl, in which case leave $base as ascii. May not be right, but # ought to work with the various ASCII superset encodings. eval { require Encode; $base = Encode::encode ($input_charset, $base); }; } } substr($str, $pos,0, $base); # insert $base at $pos return $str; } # return $str with non-ascii replaced by { entities sub _entitize { my ($str) = @_; $str =~ s{([^\x20-\x7E])}{'&#'.ord($1).';'}eg; ### $str return $str; } 1; __END__ =for stopwords HTML-FormatExternal formatter formatters charset charsets TreeBuilder ie latin-1 config Elinks absolutized tty Ryde filename recognise BOM UTF entitized unrepresentable untaint superset onwards overstriking =head1 NAME HTML::FormatExternal - HTML to text formatting using external programs =head1 DESCRIPTION This is a collection of formatter modules which turn HTML into plain text by dumping it through the respective external programs. HTML::FormatText::Elinks HTML::FormatText::Html2text HTML::FormatText::Links HTML::FormatText::Lynx HTML::FormatText::Netrik HTML::FormatText::Vilistextum HTML::FormatText::W3m HTML::FormatText::Zen The module interfaces are compatible with C modules such as C, but the external programs do all the work. Common formatting options are used where possible, such as C and C. So just by switching the class you can use a different program (or the plain C) according to personal preference, or strengths and weaknesses, or what you've got. There's nothing particularly difficult about piping through these programs, but a unified interface hides details like how to set margins and how to force input or output charsets. =head1 FUNCTIONS Each of the classes above provide the following functions. The C in the class names here is a placeholder for any of C, C, etc as above. See F in the HTML-FormatExternal sources for a complete sample program. =head2 Formatter Compatible Functions =over 4 =item C<< $text = HTML::FormatText::XXX->format_file ($filename, key=>value,...) >> =item C<< $text = HTML::FormatText::XXX->format_string ($html_string, key=>value,...) >> Run the formatter program over a file or string with the given options and return the formatted result as a string. See L below for possible key/value options. For example, $text = HTML::FormatText::Lynx->format_file ('/my/file.html'); $text = HTML::FormatText::W3m->format_string ('

Hello world!

'); C ensures any C<$filename> is interpreted as a filename (by escaping as necessary against however the programs interpret command line arguments). =item C<< $formatter = HTML::FormatText::XXX->new (key=>value, ...) >> Create a formatter object with the given options. In the current implementation an object doesn't do much more than remember the options for future use. $formatter = HTML::FormatText::Elinks->new(rightmargin => 60); =item C<< $text = $formatter->format ($tree_or_string) >> Run the C<$formatter> program on a C tree or a string, using the options in C<$formatter>, and return the result as a string. A TreeBuilder argument (ie. a C) is accepted for compatibility with C. The tree is simply turned into a string with C<< $tree->as_HTML >> to pass to the program, so if you've got a string already then give that instead of a tree. C itself has a C method (see L) which runs a given C<$formatter>. A C object can be used for C<$formatter>. $text = $tree->format($formatter); # which dispatches to $text = $formatter->format($tree); =back =head2 Extra Functions The following are extra methods not available in the plain C. =over 4 =item C<< HTML::FormatText::XXX->program_version () >> =item C<< HTML::FormatText::XXX->program_full_version () >> =item C<< $formatter->program_version () >> =item C<< $formatter->program_full_version () >> Return the version number of the formatter program as reported by its C<--version> or similar option. If the formatter program is not available then return C. C is the bare version number, perhaps with "beta" or similar indication. C is the entire version output, which may include build options, copyright notice, etc. $str = HTML::FormatText::Lynx->program_version(); # eg. "2.8.7dev.10" $str = HTML::FormatText::W3m->program_full_version(); # eg. "w3m version w3m/0.5.2, options lang=en,m17n,image,..." The version number of the respective Perl module itself is available in the usual way (see L). $modulever = HTML::FormatText::Netrik->VERSION; $modulever = $formatter->VERSION =back =head1 CHARSETS File or byte string input is by default interpreted by the programs in their usual ways. This should mean HTML Latin-1 but user configurations might override that and some programs recognise a C<< >> charset declaration or a Unicode BOM. The C option below can force the input charset. Perl wide-character input string is encoded and passed to the program in whatever way it best understands. Usually this is UTF-8 but in some cases it is entitized instead. The C option can force the input charset to use if for some reason UTF-8 is not best. The output string is either bytes or wide chars. By default output is the same as input, so wide char string input gives wide output and byte input string or file input gives byte output. The C option can force the output type (and is the way to get wide chars back from C). Byte output is whatever the program produces. Its default might be the locale charset or other user configuration which suits direct display to the user's terminal. The C option can force the output to be certain or to be ready for further processing. Wide char output is done by choosing the best output charset the program can do and decoding its output. Usually this means UTF-8 but some of the programs may only have less. The C option can force the charset used and decoded. If it's something less than UTF-8 then some programs might for example give ASCII art approximations of otherwise unrepresentable characters. Byte input is usual for HTML downloaded from a HTTP server or from a MIME email and the headers have the C which applies. Byte output is good to go straight out to a tty or back to more MIME etc. The input and output charsets could differ if a server gives something other than what you want for final output. Wide chars are most convenient for crunching text within Perl. The default wide input giving wide output is designed to be transparent for this. For reference, if a C tree contains wide char strings then its usual C method, which is used by C above, produces wide char HTML so the formatters here give wide char text. Actually C produces all ASCII because its default behaviour is to entitize anything "unsafe", but it's still a wide char string so the formatted output text is wide. =head1 OPTIONS The following options can be given to the constructor or to the formatting methods. The defaults are whatever the respective programs do. The programs generally read their config files when dumping so the defaults and formatting details may follow the user's personal preferences. Usually this is a good thing. =over 4 =item C<< leftmargin => INTEGER >> =item C<< rightmargin => INTEGER >> The column numbers for the left and right hand ends of the text. C 0 means no padding on the left. C is the text width, so for instance 60 would mean the longest line is 60 characters (inclusive of any C). These options are compatible with C. C is not necessarily a hard limit. Some of the programs will exceed it in a HTML literal C<<
 >>, or a run of C< > or similar.

=item C<< input_charset => STRING >>

Force the HTML input to be interpreted as bytes of the given charset,
irrespective of locale, user configuration, C<<  >> in the HTML, etc.

=item C<< output_charset => STRING >>

Force the text output to be encoded as the given charset.  The default
varies among the programs, but usually defaults to the locale.

=item C<< output_wide => 0,1,"as_input" >>

Select output string as wide characters rather than bytes.  The default is
C<"as_input"> which means a wide char input string results in a wide char
output string and a byte input or file input is byte output.  See
L above for how wide characters work.

Bytes or wide chars output can be forced by 0 or 1 respectively.  For
example to get wide char output when formatting a file,

    $wide_char_text = HTML::FormatText::W3m->format_file
                       ('/my/file.html', output_wide => 1);

=item C<< base => STRING >>

Set the base URL for any relative links within the HTML (similar to
C).  Usually this should be the location the
HTML was downloaded from.

If the document contains its own C<<  >> setting then currently the
document takes precedence.  Only Lynx and Elinks display absolutized link
targets and the option has no effect on the other programs.

=back

=head1 TAINT MODE

The formatter modules can be used under C taint mode.  They run
external programs so it's necessary to untaint C<$ENV{PATH}> in the usual
way per L.

The formatted text strings returned are always tainted, on the basis that
they use or include data from outside the Perl program.  The
C and C strings are tainted too.

=head1 BUGS

C is implemented by adding spaces to the program output.  For
byte output it this is ASCII spaces and that will be badly wrong for unusual
output like UTF-16 which is not a byte superset of ASCII.  For wide char
output the margin is applied after decoding to wide chars so is correct.
It'd be better to ask the programs to do the margin but their options for
that are poor.

There's nothing done with errors or warning messages from the programs.
Generally they make a best effort on doubtful HTML, but fatal errors like
bad options or missing libraries ought to be somehow trapped.

=head1 OTHER POSSIBILITIES

C (from Aug 2008 onwards) and C can produce ANSI escapes for
colours, underline, etc, and C and C can produce tty style
backspace overstriking.  This might be good for text destined for a tty or
further crunching.  Perhaps an C or C option could enable this,
where possible, but for now it's deliberately turned off in those programs
to keep the default as plain text.

=head1 SEE ALSO

L,
L,
L,
L,
L,
L,
L,
L

L,
L,
L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2011, 2012, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/0002755000175000017500000000000012570233261016166 5ustar  ggggHTML-FormatExternal-26/lib/HTML/FormatText/Html2text.pm0000644000175000017500000001424112560646141020423 0ustar  gggg# Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .

package HTML::FormatText::Html2text;
use 5.006;
use strict;
use warnings;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

# uncomment this to run the ### lines
# use Smart::Comments;

our $VERSION = 26;

use constant DEFAULT_LEFTMARGIN => 0;
use constant DEFAULT_RIGHTMARGIN => 79;

my $have_ascii;
my $have_utf8;
use constant::defer _check_help => sub {  # run once only
  my ($class) = @_;
  my $help = $class->_run_version (['html2text', '-help']);
  $have_ascii = (defined $help && $help =~ /-ascii/);
  $have_utf8  = (defined $help && $help =~ /-utf8/);
  return undef;
};

# return true if the "-ascii" option is available (new in html2text
# version 1.3.2 from Jan 2004)
sub _have_ascii {
  my ($class) = @_;
  $class->_check_help();
  return $have_ascii;
}

# return true if the "-utf8" option is available (a Debian addition circa 2009)
sub _have_utf8 {
  my ($class) = @_;
  $class->_check_help();
  return $have_utf8;
}

# The Debian -utf8 option can give UTF-8 output.
# For input believe entitized is the only way to be confident of working
# with both original and Debian extended.
#
use constant _WIDE_INPUT_CHARSET => 'entitize';
sub _WIDE_OUTPUT_CHARSET {
  my ($class) = @_;
  return ($class->_have_utf8() ? 'UTF-8' : 'iso-8859-1');
}

sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['html2text','-version'], '2>&1');
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # eg. "This is html2text, version 1.3.2a"
  $version =~ /^.*version (.*)/
    or $version =~ /^(.*)/;  # whole first line if format not recognised
  return $1 . substr($version,0,0);  # retain taintedness
}

sub _make_run {
  my ($class, $input_filename, $options) = @_;

  # -nobs means don't do underlining with "_ backspace X" sequences.
  # Backspaces are fun for teletype output, but the intention here is plain
  # text.  The Debian html2text has -nobs by default anyway.
  #
  my @command = ('html2text', '-nobs');

  if (defined $options->{'_width'}) {
    push @command, '-width', $options->{'_width'};
  }

  if ($class->_have_ascii) {
    if (my $output_charset = $options->{'output_charset'}) {
      $output_charset = lc($output_charset);
      if ($output_charset eq 'ascii' || $output_charset eq 'ansi_x3.4-1968') {
        push @command, '-ascii';
      }
    }
  }

  # 'html2text_options' not documented ...
  push @command, @{$options->{'html2text_options'} || []};

  # "html2text -" input filename "-" means read standard input.
  # Any other "-foo" starting with "-" is an option and there's no apparent
  # "--" to mark the end of options (as of its version 1.3.2a).
  #
  # Normally html2text takes URL style file: or http:, but the debian
  # version mangles it to a bare filename only.  This makes it hard to
  # escape a name suitably to get through both.  Instead use standard input
  # which both versions read by default.

  return (\@command,
          '<', $input_filename);
}

1;
__END__

=for stopwords HTML-FormatExternal html2text formatters ascii charset latin-1 Ryde recognised entitized UTF

=head1 NAME

HTML::FormatText::Html2text - format HTML as plain text using html2text

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::Html2text;
 $text = HTML::FormatText::Html2text->format_file ($filename);
 $text = HTML::FormatText::Html2text->format_string ($html_string);

 $formatter = HTML::FormatText::Html2text->new;
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the
C program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by html2text.

See C for the formatting functions and options, with
the following caveats,

=over 4

=item C

Currently this option has no effect.  Input generally has to be latin-1
only, though the Debian extended C interprets a C<<  >>
charset directive in the HTML header.

Various C<&> style named or numbered entities are recognised and result in
suitable output.  The suggestion would be entitized input for maximum
portability among C versions.

=item C

If set to "ascii" or "ANSI_X3.4-1968" (both case-insensitive) the
C option is used, when available (C 1.3.2 from
Jan 2004).

If set to "UTF-8" then Debian extension C<-utf8> option is used (circa
2009).

Apart from this there's no control over the output charset.

=back

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/Elinks.pm0000644000175000017500000001323512560646141017757 0ustar  gggg# Copyright 2008, 2009, 2010, 2012, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .


# elinks.conf(5) describes the various config options
#
# Maybe:
#     --dump-charset UTF-8 ??
#


package HTML::FormatText::Elinks;
use 5.006;
use strict;
use warnings;
use URI::file;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

our $VERSION = 26;

use constant DEFAULT_LEFTMARGIN => 3;
use constant DEFAULT_RIGHTMARGIN => 77;

sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['elinks', '-version']);
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # eg. "ELinks 0.12pre2\n
  #      Built on Oct  2 2008 18:34:16"
  #
  $version =~ /^ELinks (.*)/i
    or $version =~ /^(.*)/;  # whole first line if format not recognised
  return $1 . substr($version,0,0);  # retain taintedness
}

sub _make_run {
  my ($class, $input_filename, $options) = @_;
  my @command = ('elinks', '-dump', '-force-html');

  #   if ($options->{'ansi_colour'}) {
  #     push @command, '-eval', 'set document.dump.color_mode=1';
  #   }

  if (defined $options->{'_width'}) {
    push @command,
      '-dump-width', $options->{'_width'},
        '-eval', 'set document.browse.margin_width=0';
  }

  if (my $input_charset = $options->{'input_charset'}) {
    $input_charset = _elinks_mung_charset ($input_charset);
    push @command,
      '-eval', ('set document.codepage.assume='
                . _quote_config_stringarg($input_charset)),
      '-eval', 'set document.codepage.force_assumed=1';

  }
  if (my $output_charset = $options->{'output_charset'}) {
    push @command, '-dump-charset', _elinks_mung_charset ($output_charset);
  }

  # 'elinks_options' not documented ...
  push @command, @{$options->{'elinks_options'} || []};

  # elinks takes any "-foo" to be an option (except a bare "-") and
  # there's no apparent "--" to end options (in its version 0.12pre5).
  # Filenames starting "http:" are rejected.
  # Turn into file:// using URI::file to ensure literal filename.
  #
  push @command, URI::file->new_abs($input_filename)->as_string;

  return (\@command);
}

# elinks (version 0.12pre2 at least) is picky about charset names in a
# similar fashion to the main "links" program (see Links.pm).  Turn
# "latin-1" into "latin1" here for convenience.
#
sub _elinks_mung_charset {
  my ($charset) = @_;
  $charset =~ s/^(latin)-([0-9]+)$/$1$2/i;
  return $charset;
}

# Return $str with quotes around it, and backslashed within it, suitable for
# use in an elinks config file or -eval of a config file line.
sub _quote_config_stringarg {
  my ($str) = @_;
  $str =~ s/'/\\'/g;  # ' -> \'
  return "'$str'";    # '$str' surrounding quotes
}

1;
__END__

=for stopwords HTML-FormatExternal elinks formatters Elinks 0.12pre2 multibyte charset utf-8 charsets latin-1 latin1 Ryde recode unibyte

=head1 NAME

HTML::FormatText::Elinks - format HTML as plain text using elinks

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::Elinks;
 $text = HTML::FormatText::Elinks->format_file ($filename);
 $text = HTML::FormatText::Elinks->format_string ($html_string);

 $formatter = HTML::FormatText::Elinks->new (rightmargin => 60);
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the C
program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by elinks.

See C for the formatting functions and options, all of
which are supported by C with the following
caveats.

=over 4

=item C

As of Elinks 0.12pre2 (Oct 2008) has various unibyte input charsets but the
only multibyte input charset accepted is utf-8.  You could recode others to
utf-8 if necessary (but this module doesn't attempt to do that
automatically).

=back

Elinks can be a little picky about its charset names.  This module attempts
to ease that by for instance turning "latin-1" (not accepted) into "latin1"
(which is accepted).  A full name "ISO-8859-1" etc is accepted too.

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2012, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/Vilistextum.pm0000644000175000017500000001245512560646140021071 0ustar  gggg# Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .


# The long options like --version depend on vilistextum being built with
#  and getopt_long().  The single-letter options like -v are
# always available.


package HTML::FormatText::Vilistextum;
use 5.006;
use strict;
use warnings;
use Carp;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

# uncomment this to run the ### lines
# use Smart::Comments;


our $VERSION = 26;

# no left margin by default, no option to add it
use constant DEFAULT_LEFTMARGIN => 0;
use constant DEFAULT_RIGHTMARGIN => 76; # file text.c has breite=76

# return true if vilistextum has its -u "--output-utf-8" option
use constant::defer _have_output_utf8 => sub {
  my ($class) = @_;
  my $help = $class->_run_version (['vilistextum', '-help']);
  return (defined $help && $help =~ /\s-u[, ]/);
};

use constant _WIDE_INPUT_CHARSET => 'entitize';
sub _WIDE_OUTPUT_CHARSET {
  my ($class) = @_;
  return ($class->_have_output_utf8() ? 'UTF-8' : 'iso-8859-1');
}

sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['vilistextum', '-v']);
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # eg. "Vilistextum 2.6.9 (22.10.2006)"
  $version =~ m{^Vilistextum ([0-9][^ ]*)}i
    or $version =~ /^(.*)/;  # whole first line if format not recognised
  return $1;
}

sub _make_run {
  my ($self, $input_filename, $options) = @_;
  my @command = ('vilistextum');

  if (defined $options->{'_width'}) {
    push @command, '-w', $options->{'_width'};
  }

  if ($options->{'output_charset'}) {
    if (lc($options->{'output_charset'}) eq 'utf-8') {
      # If asked for utf-8 and no multibyte then don't want to silently give
      # back latin-1 instead.
      # Maybe it'd be better to use Encode.pm to convert.
      if (! $self->_have_output_utf8()) {
        croak "Output charset $options->{'output_charset'} not available, vilistextum built without multibyte";
      }
      push @command, '-u';

    } else {
      # Not sure about croaking on unknown charset.
      # if $output_charset ne 'latin-1' 'iso-8859-1'
      # croak "Output charset $options->{'output_charset'} unknown";
    }
  }

  # 'vilistextum_options' not documented ...
  push @command, @{$options->{'vilistextum_options'} || []};

  # "-" means to stdout
  push @command, $input_filename, '-';

  return (\@command);
}

1;
__END__

=for stopwords HTML-FormatExternal vilistextum sourceforge.net formatters charset Ryde UTF

=head1 NAME

HTML::FormatText::Vilistextum - format HTML as plain text using vilistextum

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::Vilistextum;
 $text = HTML::FormatText::Vilistextum->format_file ($filename);
 $text = HTML::FormatText::Vilistextum->format_string ($html_string);

 $formatter = HTML::FormatText::Vilistextum->new;
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the
C program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by vilistextum.

See C for the formatting functions and options, with
the following caveats,

=over 4

=item C

There's no C option yet.  (C has a C<-y> option
but it might be only a default, with the document C<<  >> taking
precedence, whereas the intention of C is to override the
document.)

=item C

Charset "UTF-8" can be given for UTF-8 output.  This passes C<-u> to
C, which is only available if built with C<--enable-multibyte>
(as of its version 2.6.9).

=back

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/Links.pm0000644000175000017500000001327712560646141017620 0ustar  gggg# Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .

package HTML::FormatText::Links;
use 5.006;
use strict;
use warnings;
use URI::file;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

# uncomment this to run the ### lines
# use Smart::Comments;


our $VERSION = 26;

use constant DEFAULT_LEFTMARGIN => 3;
use constant DEFAULT_RIGHTMARGIN => 77;
use constant _WIDE_INPUT_CHARSET => 'entitize';
use constant _WIDE_OUTPUT_CHARSET => 'iso-8859-1';

# It seems maybe some people make "links" an alias for "elinks", and the
# latter doesn't have -html-margin.  Maybe it'd be worth adapting to use
# elinks style "set document.browse.margin_width=0" in that case, but for
# now just don't use it if it doesn't work.
#
use constant::defer _have_html_margin => sub {
  my ($class) = @_;
  my $help = $class->_run_version (['links', '-help']);
  return (defined $help && $help =~ /-html-margin/);
};

sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['links', '-version']);
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # first line like "Links 1.00pre12" or "Links 2.2"
  $version =~ /^Links (.*)/i
    or $version =~ /^(.*)/;  # whole first line if format not recognised
  return $1 . substr($version,0,0);  # retain taintedness
}

sub _make_run {
  my ($class, $input_filename, $options) = @_;
  my @command = ('links', '-dump', '-force-html');

  if (defined $options->{'_width'}) {
    push @command, '-width', $options->{'_width'};
    if ($class->_have_html_margin) {
      push @command, '-html-margin', 0;
    }
  }

  if (my $input_charset = $options->{'input_charset'}) {
    push @command,
      '-html-assume-codepage', _links_mung_charset ($input_charset),
        '-html-hard-assume', 1;
  }
  if (my $output_charset = $options->{'output_charset'}) {
    push @command, '-codepage', _links_mung_charset ($output_charset);
  }

  # 'links_options' not documented ...
  push @command, @{$options->{'links_options'} || []};

  # links interprets "%" in the input filename as URI style %ff hex
  # encodings.  Turn unusual filenames like "%" or "-" into full
  # file:// using URI::file.
  push @command, URI::file->new_abs($input_filename)->as_string;

  return (\@command);
}

# links (version 2.2 at least) accepts "latin1" but not "latin-1".  The
# latter is accepted by the other FormatExternal programs, so turn "latin-1"
# into "latin1" for convenience.
#
sub _links_mung_charset {
  my ($charset) = @_;
  $charset =~ s/^(latin)-([0-9]+)$/$1$2/i;
  return $charset;
}


1;
__END__

=for stopwords HTML-FormatExternal formatters charset UTF-8 unicode latin-1 latin1 Ryde

=head1 NAME

HTML::FormatText::Links - format HTML as plain text using links

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::Links;
 $text = HTML::FormatText::Links->format_file ($filename);
 $text = HTML::FormatText::Links->format_string ($html_string);

 $formatter = HTML::FormatText::Links->new (rightmargin => 60);
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the C
program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by links.  See C for the
formatting functions and options, all of which are supported by
C, with the following caveats.

=over 4

=item C, C

In past versions of links without the C<-html-margin> option you always get
an extra 3 spaces within the requested left and right margins.

=item C, C

An output charset requires Links 2.0 or higher (or some such version), and
as of 2.2 the output cannot be UTF-8 (though the input can be).  Various
unicode inputs are turned into reasonable output though, for example smiley
face U+263A becomes ":-)".

=back

Links can be a bit picky about its charset names.  This module attempts to
ease that by for instance turning "latin-1" (not accepted) into "latin1"
(which is accepted).  A full "ISO-8859-1" etc is accepted too.

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/Zen.pm0000644000175000017500000001020712560646140017261 0ustar  gggg# Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .

package HTML::FormatText::Zen;
use 5.006;
use strict;
use warnings;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

our $VERSION = 26;

use constant DEFAULT_LEFTMARGIN => 0;
use constant DEFAULT_RIGHTMARGIN => 80;

# no input charset options
use constant _WIDE_INPUT_CHARSET => 'entitize';

sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['zen', '--version']);
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # eg. "zen version 0.2.3"
  $version =~ /^zen version (.*)/i
    or $version =~ /^(.*)/;  # whole first line if format not recognised
  return $1 . substr($version,0,0);  # retain taintedness
}

sub _make_run {
  my ($class, $input_filename, $options) = @_;

  # Is it worth enforcing/checking this ?
  # Could use Encode.pm to convert the output without too much trouble.
  #
  #   if (my $input_charset = $options->{'input_charset'}) {
  #     $input_charset =~ /^latin-?1$|^iso-?8859-1$/i
  #       or croak "Zen only accepts latin-1 input";
  #   }
  #   if (my $output_charset = $options->{'output_charset'}) {
  #     $output_charset =~ /^latin-?1$|^iso-?8859-1$/i
  #       or croak "Zen only produces latin-1 output";
  #   }

  # 'zen_options' not documented ...
  return ([ 'zen', '-i', 'dump',
            @{$options->{'zen_options'} || []},
            '--',  # end of options
            $input_filename,
          ]);
}

1;
__END__

=for stopwords HTML-FormatExternal formatters charset latin-1 Ryde

=head1 NAME

HTML::FormatText::Zen - format HTML as plain text using zen

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::Zen;
 $text = HTML::FormatText::Zen->format_file ($filename);
 $text = HTML::FormatText::Zen->format_string ($html_string);

 $formatter = HTML::FormatText::Zen->new;
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the C
program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by zen.

See C for the formatting functions.  The margins
options work but nothing else.

=over

=item C

As of zen version 0.2.3 there is no right margin option.

=item C, C

As of zen version 0.2.3 the input charset is always latin-1 and output is
always latin-1.  Entities in the input seem to be truncated to 8-bits for
the output.

=back

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/W3m.pm0000644000175000017500000001057612560646140017204 0ustar  gggg# Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .

package HTML::FormatText::W3m;
use 5.006;
use strict;
use warnings;
use URI::file;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

our $VERSION = 26;

use constant DEFAULT_LEFTMARGIN => 0;
use constant DEFAULT_RIGHTMARGIN => 80;

sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['w3m', '-version']);
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # eg. "w3m version w3m/0.5.2, options lang=en,m17n,image,color,..."
  $version =~ m{^w3m version (?:w3m/)?(.*?),}i
    or $version =~ /^(.*)/;  # whole first line if format not recognised
  return $1 . substr($version,0,0);  # retain taintedness
}

sub _make_run {
  my ($class, $input_filename, $options) = @_;
  my @command = ('w3m', '-dump', '-T', 'text/html');

  # w3m seems to use one less than the given -cols, presumably designed with
  # a tty in mind so "-cols 80" prints just 79 so as not to wrap around
  if (defined $options->{'_width'}) {
    push @command, '-cols', $options->{'_width'} + 1;
  }

  if ($options->{'input_charset'}) {
    push @command, '-I', $options->{'input_charset'};
  }
  if ($options->{'output_charset'}) {
    push @command, '-O', $options->{'output_charset'};
  }

  # 'w3m_options' not documented ...
  push @command, @{$options->{'w3m_options'} || []};

  # w3m (circa its version 0.5.3) interprets "%" in the input
  # filename as URI style %ff hex encodings.  Turn unusual filenames
  # like "%" into full file:// using URI::file.
  #
  # Filenames merely starting "-" can be given as "./-" etc to avoid
  # them being interpreted as options.  The file:// does this too.
  #
  push @command, URI::file->new_abs($input_filename)->as_string;

  return (\@command);
}

sub new {
  my ($class, %self) = @_;
  return bless \%self, $class;
}
sub format {
  my ($self, $html) = @_;
  if (ref $html) { $html = $html->as_HTML; }
  return $self->format_string ($html, %$self);
}

1;
__END__

=for stopwords HTML-FormatExternal formatters Ryde

=head1 NAME

HTML::FormatText::W3m - format HTML as plain text using w3m

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::W3m;
 $text = HTML::FormatText::W3m->format_file ($filename);
 $text = HTML::FormatText::W3m->format_string ($html_string);

 $formatter = HTML::FormatText::W3m->new (rightmargin => 60);
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the C program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by w3m.

See C for the formatting functions and options, all of
which are supported by C.

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/Lynx.pm0000644000175000017500000001337612560646140017471 0ustar  gggg# Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .

package HTML::FormatText::Lynx;
use 5.006;
use strict;
use warnings;
use URI::file;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

our $VERSION = 26;

use constant DEFAULT_LEFTMARGIN => 2;
use constant DEFAULT_RIGHTMARGIN => 72;

# return true if the "-nomargins" option is available (new in Lynx
# 2.8.6dev.12 from June 2005)
use constant::defer _have_nomargins => sub {
  my ($class) = @_;
  my $help = $class->_run_version (['lynx', '-help']);
  return (defined $help && $help =~ /-nomargins/);
};

sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['lynx', '-version']);
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # eg. "Lynx Version 2.8.7dev.10 (21 Sep 2008)"
  $version =~ /^Lynx Version (.*?) \(/i
    or $version =~ /^(.*)/;  # whole first line if format not recognised
  return $1 . substr($version,0,0);  # retain taintedness
}

sub _make_run {
  my ($class, $input_filename, $options) = @_;
  my @command = ('lynx', '-dump', '-force_html');

  if (defined $options->{'_width'}) {
    push @command, '-width', $options->{'_width'};
    if ($class->_have_nomargins) {
      push @command, '-nomargins';
    }
  }
  if (my $input_charset = $options->{'input_charset'}) {
    push @command, '-assume_charset', $input_charset;
  }
  if (my $output_charset = $options->{'output_charset'}) {
    push @command, '-display_charset', $output_charset;
  }
  if ($options->{'justify'}) {
    push @command, '-justify';
  }
  if ($options->{'unique_links'}) {
    push @command, '-unique_urls';
  }


  # -underscore gives _foo_ style for  underline, though it seems to need
  # -with_backspaces to come out.  It doesn't use backspaces it seems,
  # unlike the name would suggest ...

  # 'lynx_options' not documented ...
  push @command, @{$options->{'lynx_options'} || []};

  # "lynx -" means read standard input.
  # Any other "-foo" is an option.
  # Recent lynx has "--" to mean end of options, but not circa 2.8.6.
  # "lynx dir/http:" attempts to connect to something.
  # Escape all this by URI::file.
  push @command, URI::file->new_abs($input_filename)->as_string;

  return (\@command);
}

1;
__END__

=for stopwords HTML-FormatExternal formatters latin-1 iso-8859-1 boolean Ryde eg

=head1 NAME

HTML::FormatText::Lynx - format HTML as plain text using lynx

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::Lynx;
 $text = HTML::FormatText::Lynx->format_file ($filename);
 $text = HTML::FormatText::Lynx->format_string ($html_string);

 $formatter = HTML::FormatText::Lynx->new (rightmargin => 60);
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the C program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by lynx.

See C for the formatting functions and options, all of
which are supported by C, with the following caveats

=over 4

=item C, C

Prior to the C<-nomargins> option of Lynx 2.8.6dev.12 (June 2005) an
additional 3 space margin is always applied within the requested left and
right positions.

=item C, C

Note that "latin-1" etc is not accepted, it must be "iso-8859-1" etc.

C becomes the C<-display_charset> option and can't be used
on very old C which doesn't have that option (eg. lynx circa 2.8.1).
Perhaps in the future C could be dropped if it's already
what will be output, or throw a Perl error when unsupported.

=back

=head2 Extra Options

=over 4

=item C (boolean)

If true then C<-justify> is passed to lynx to have all lines in the paragraph
padded out with extra spaces to the given C (or default right
margin).

=item C (boolean)

If true then C<-unique_urls> is passed to have lynx give its link footnotes
just once for each distinct URL, re-used when the same URL occurs more than
once in the document.  This module option is per
L.

=back

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/lib/HTML/FormatText/Netrik.pm0000644000175000017500000001404212560646140017762 0ustar  gggg# Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

# HTML-FormatExternal is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 3, or (at your option) any
# later version.
#
# HTML-FormatExternal is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
# or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
# for more details.
#
# You should have received a copy of the GNU General Public License along
# with HTML-FormatExternal.  If not, see .

package HTML::FormatText::Netrik;
use 5.006;
use strict;
use warnings;
use URI::file;
use HTML::FormatExternal;
our @ISA = ('HTML::FormatExternal');

# uncomment this to run the ### lines
# use Smart::Comments;

our $VERSION = 26;

use constant DEFAULT_LEFTMARGIN => 3;
use constant DEFAULT_RIGHTMARGIN => 77;

# as of Netrik 1.16.1 there's no input charsets, so entitize
use constant _WIDE_INPUT_CHARSET => 'entitize';

# --dump here as otherwise netrik runs the curses interface on the initial
# page given in ~/.netrikrc.  If there's no such file then it prints a
# little "usage: netrik html-file" but there's nothing interesting in that.
# Option '-' to read stdin which _run_version() makes /dev/null.
#
# --bw avoids warnings on a monochrome terminal.  Don't want colours for any
# usage message etc anyway.
#
sub program_full_version {
  my ($self_or_class) = @_;
  return $self_or_class->_run_version (['netrik','--bw','--version','--dump','-'], '2>&1');
}
sub program_version {
  my ($self_or_class) = @_;
  my $version = $self_or_class->program_full_version;
  if (! defined $version) { return undef; }

  # as of netrik 1.16.1 there doesn't seem to be any option that prints the
  # version number, it's possible it's not compiled into the binary at all
  return '(not reported)';
}

sub _make_run {
  my ($class, $input_filename, $options) = @_;
  ### Netrik _make_run() ...

  #   if (! $options->{'ansi_colour'}) {
  #     push @command, '--bw';
  #   }

  # COLUMNS influences the curses tigetnum("cols") used under --term-width.
  # Slightly hairy, but it has the right effect.
  if (defined $options->{'_width'}) {
    $options->{'ENV'}->{'COLUMNS'} = $options->{'_width'};
  }

  # netrik 1.16.1 does a curses setupterm() even for a --dump so it must
  # have a TERM.  Think "TERM=dumb" is known to any termcap or terminfo.
  # But leave a user's existing TERM setting alone in case it does something
  # good for netrik, though you'd hope it wouldn't affect --dump.
  #
  unless ($ENV{'TERM'}) {
    $options->{'ENV'}->{'TERM'} = 'dumb';
  }

  # --bw to avoid warnings when on a monochrome terminal.  Don't want
  # colours in a dump anyway.  (Option --bw is in options.txt and the
  # README.)
  #
  # 'netrik_options' not documented ...
  return ([ 'netrik', '--dump', '--bw',
            @{$options->{'netrik_options'} || []},

            # netrik interprets "%" in the input filename as URI style %ff hex
            # encodings.  And it rejects filenames with non-URI chars such as
            # "-" (except for "-" alone which means stdin).  Turn unusual
            # filenames like "%" or "-" into full file:// using URI::file.
            URI::file->new_abs($input_filename)->as_string,
          ]);
}

1;
__END__

=for stopwords HTML-FormatExternal netrik sourceforge.net formatters charset Ryde

=head1 NAME

HTML::FormatText::Netrik - format HTML as plain text using netrik

=for test_synopsis my ($text, $filename, $html_string, $formatter, $tree)

=head1 SYNOPSIS

 use HTML::FormatText::Netrik;
 $text = HTML::FormatText::Netrik->format_file ($filename);
 $text = HTML::FormatText::Netrik->format_string ($html_string);

 $formatter = HTML::FormatText::Netrik->new;
 $tree = HTML::TreeBuilder->new_from_file ($filename);
 $text = $formatter->format ($tree);

=head1 DESCRIPTION

C turns HTML into plain text using the C
program.

=over 4

L

=back

The module interface is compatible with formatters like C,
but all parsing etc is done by netrik.

C normally emits colour escape sequences but that is disabled here
(its C<--bw> option) to get plain text.

See C for the formatting functions and options, with
the following caveats,

=over 4

=item C, C

These charset overrides have no effect.  Input might be single-byte only,
and output probably follows the input (as of netrik 1.15.7).

=back

=head1 BUGS

C version 1.16.1 initializes curses even when doing just a
C<--dump>, so if you have a C environment variable then it must be a
terminal type known to curses (L).  If you have no C
setting then C runs C with C so
the code here works in a bare environment.  (But no attempt is made here to
validate or correct an existing C value.)

=cut

# If you set something then it should be a known terminal type.  (C
# uses the terminal type for colour escapes, which are disabled here, and as
# a final default text width if neither C nor
# C).

=pod

=head1 SEE ALSO

L, L

=head1 HOME PAGE

L

=head1 LICENSE

Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde

HTML-FormatExternal is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 3, or (at your option) any later
version.

HTML-FormatExternal is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
HTML-FormatExternal.  If not, see L.

=cut
HTML-FormatExternal-26/t/0002755000175000017500000000000012570233261013022 5ustar  ggggHTML-FormatExternal-26/t/test.html0000644000175000017500000000144512153475740014700 0ustar  gggg



 This is the title text.



 

This is the body text.

HTML-FormatExternal-26/t/FormatExternal.t0000755000175000017500000003024612570233162016150 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use 5.006; use strict; use warnings; use FindBin; use HTML::FormatExternal; use Test::More tests => 284; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings() } if (defined $ENV{PATH}) { ($ENV{PATH}) = ($ENV{PATH} =~ /(.*)/); # untaint so programs can run } { my $want_version = 26; is ($HTML::FormatExternal::VERSION, $want_version, 'VERSION variable'); is (HTML::FormatExternal->VERSION, $want_version, 'VERSION class method'); ok (eval { HTML::FormatExternal->VERSION($want_version); 1 }, "VERSION class check $want_version"); my $check_version = $want_version + 1000; ok (! eval { HTML::FormatExternal->VERSION($check_version); 1 }, "VERSION class check $check_version"); } # Cribs: # # Test::More::like() ends up spinning a qr// through a further /$re/ which # loses any /m modifier, prior to perl 5.10.0 at least. So /m is avoided in # favour of some (^|\n) and ($|{\r\n]) forms. sub is_undef_or_string { my ($obj) = @_; if (! defined $obj) { return 1; } if (ref $obj) { return 0; } if ($obj eq '') { return 0; } # disallow empty return 1; } sub is_undef_or_one_line_string { my ($obj) = @_; if (! defined $obj) { return 1; } if (ref $obj) { return 0; } if ($obj eq '') { return 0; } # disallow empty if ($obj =~ /\n/) { return 0; } return 1; } my $colon_is_ordinary; foreach my $class ('HTML::FormatText::Elinks', 'HTML::FormatText::Html2text', 'HTML::FormatText::Links', 'HTML::FormatText::Lynx', 'HTML::FormatText::Netrik', 'HTML::FormatText::W3m', 'HTML::FormatText::Zen', ) { diag $class; use_ok ($class); is ($class->VERSION, $HTML::FormatExternal::VERSION, "$class VERSION method"); is (do { no strict 'refs'; ${"${class}::VERSION"} }, $HTML::FormatExternal::VERSION, "$class VERSION variable"); # # program_full_version() # { my $version = $class->program_full_version; require Data::Dumper; diag ("$class program_full_version ", Data::Dumper::Dumper($version)); ok (is_undef_or_string($version), 'program_full_version() from class'); } { my $formatter = $class->new; my $version = $formatter->program_full_version; ok (is_undef_or_string($version), 'program_full_version() from obj'); } # # program_version() # { my $version = $class->program_version(); require Data::Dumper; diag ("$class program_version ", Data::Dumper::Dumper($version)); ok (is_undef_or_one_line_string($version), "$class program_version() from class"); } { my $formatter = $class->new; my $version = $formatter->program_version(); ok (is_undef_or_one_line_string($version), "$class program_version() from obj"); } foreach my $method ('_have_nomargins', '_have_html_margin', '_have_ascii') { if ($class->can($method)) { diag "$class $method() ",($class->$method ? "yes" : "no"); } } SKIP: { if (! defined $class->program_full_version) { skip "$class program not available", 33; } { my $str = $class->format_string ('Hello'); like ($str, qr/Hello/, "$class through class"); } { my $formatter = $class->new; my $str = $formatter->format ('Hello'); like ($str, qr/Hello/, "$class through formatter object"); } SKIP: { eval { require HTML::TreeBuilder } or skip 'HTML::TreeBuilder not available', 1; my $tree = HTML::TreeBuilder->new_from_content ('Hello'); my $formatter = $class->new; my $str = $formatter->format ($tree); like ($str, qr/Hello/, "$class through formatter object on TreeBuilder"); } SKIP: { if ($class =~ /Lynx/ && ! $class->_have_nomargins()) { skip "this Lynx doesn't have -nomargins", 1; } if ($class =~ /Links/ && ! $class->_have_html_margin()) { skip "this links doesn't have -html-margin", 1; } my $str = $class->format_string ('Hello', leftmargin => 0); like ($str, qr/(^|\n)Hello/, # allowing for leading blank lines "$class through class, with leftmargin 0"); } SKIP: { if ($class =~ /Zen/) { skip "$class doesn't support rightmargin", 1; } if ($class =~ /Lynx/ && ! $class->_have_nomargins()) { skip "this Lynx doesn't have -nomargins", 1; } if ($class =~ /Links/ && ! $class->_have_html_margin()) { skip "this links doesn't have -html-margin", 1; } my $html = '123 567 9012 abc def ghij'; my $str = $class->format_string ($html, leftmargin => 0, rightmargin => 12); { require Data::Dumper; my $dumper = Data::Dumper->new([$str],['output']); $dumper->Useqq (1); diag ($dumper->Dump); } like ($str, qr/(^|\n)123 567 9012($|[\r\n])/, "$class through class, with leftmargin 0 rightmargin 12"); } foreach my $data ([ 'ascii', '', 'Foo', 'http://foo.org/page.html' ], [ 'utf16le', "\377\376", "<\0h\0t\0m\0l\0>\0<\0b\0o\0d\0y\0>\0<\0a\0 \0h\0r\0e\0f\0=\0\"\0p\0a\0g\0e\0.\0h\0t\0m\0l\0\"\0>\0F\0o\0o\0<\0/\0a\0>\0<\0/\0b\0o\0d\0y\0>\0<\0/\0h\0t\0m\0l\0>\0", 'http://foo.org/page.html' ], [ 'utf16be', "\376\377", "\0<\0h\0t\0m\0l\0>\0<\0b\0o\0d\0y\0>\0<\0a\0 \0h\0r\0e\0f\0=\0\"\0p\0a\0g\0e\0.\0h\0t\0m\0l\0\"\0>\0F\0o\0o\0<\0/\0a\0>\0<\0/\0b\0o\0d\0y\0>\0<\0/\0h\0t\0m\0l\0>", 'http://foo.org/page.html' ], ['utf32le', "\377\376\0\0", "<\0\0\0h\0\0\0t\0\0\0m\0\0\0l\0\0\0>\0\0\0<\0\0\0b\0\0\0o\0\0\0d\0\0\0y\0\0\0>\0\0\0<\0\0\0a\0\0\0 \0\0\0h\0\0\0r\0\0\0e\0\0\0f\0\0\0=\0\0\0\"\0\0\0p\0\0\0a\0\0\0g\0\0\0e\0\0\0.\0\0\0h\0\0\0t\0\0\0m\0\0\0l\0\0\0\"\0\0\0>\0\0\0F\0\0\0o\0\0\0o\0\0\0<\0\0\0/\0\0\0a\0\0\0>\0\0\0<\0\0\0/\0\0\0b\0\0\0o\0\0\0d\0\0\0y\0\0\0>\0\0\0<\0\0\0/\0\0\0h\0\0\0t\0\0\0m\0\0\0l\0\0\0>\0\0\0", 'http://foo.org/page.html' ], [ 'utf32be', "\0\0\376\377", "\0\0\0<\0\0\0h\0\0\0t\0\0\0m\0\0\0l\0\0\0>\0\0\0<\0\0\0b\0\0\0o\0\0\0d\0\0\0y\0\0\0>\0\0\0<\0\0\0a\0\0\0 \0\0\0h\0\0\0r\0\0\0e\0\0\0f\0\0\0=\0\0\0\"\0\0\0p\0\0\0a\0\0\0g\0\0\0e\0\0\0.\0\0\0h\0\0\0t\0\0\0m\0\0\0l\0\0\0\"\0\0\0>\0\0\0F\0\0\0o\0\0\0o\0\0\0<\0\0\0/\0\0\0a\0\0\0>\0\0\0<\0\0\0/\0\0\0b\0\0\0o\0\0\0d\0\0\0y\0\0\0>\0\0\0<\0\0\0/\0\0\0h\0\0\0t\0\0\0m\0\0\0l\0\0\0>", 'http://foo.org/page.html' ], ) { my ($charset, $charset_bom, $html, $want_str) = @$data; my $want_re = qr/\Q$want_str/; foreach my $bom ('', ($charset_bom ? ($charset_bom) : ())) { SKIP: { # html2text -- doesn't show link targets # links -- doesn't show link targets # w3m -- doesn't show link targets # zen -- shows only the plain href part, doesn't expand if ($class !~ /Elinks|Lynx/) { skip "$class doesn't display absolutized links", 2; } if ($charset ne 'ascii' && $class =~ /Elinks/) { skip "$class only takes 8-bit input (as of 0.12pre5)", 2; } $html = "$bom$html"; my @input_charset = ($bom ? (input_charset => $charset) : ()); my $desc = "$class base, $charset, ".($bom?'bom':'input_charset'); { my $str = $class->format_string ($html, base => 'http://foo.org', output_charset => 'us-ascii', @input_charset); # require Data::Dumper; # diag "$charset ", Data::Dumper->new([\$str],['str'])->Dump; like ($str, $want_re, "format_string() $desc"); } { require File::Temp; my $fh = File::Temp->new (SUFFIX => '.html'); $fh->autoflush(1); binmode($fh) or die 'Cannot set binmode on temp file for test'; print $fh $html or die 'Cannot write temp file for test'; my $filename = $fh->filename; my $str = $class->format_file ($filename, base => 'http://foo.org', output_charset => 'us-ascii', @input_charset); like ($str, $want_re, "format_string() $desc"); } } } } # Exercise some strange filenames which might provoke the formatter # programs. # { require File::Spec; my $testfilename = File::Spec->catfile($FindBin::Bin,'test.html'); if (! defined $colon_is_ordinary) { my $is_absolute = File::Spec->splitpath('C:/FOO'); my ($volume,$directories,$file) = File::Spec->splitpath('C:FOO'); $colon_is_ordinary = ($volume eq '' && $directories eq '' && ! $is_absolute ? 1 : 0); }; diag "Colon character is ordinary in filenames: ", $colon_is_ordinary; require File::Temp; require File::Copy; my $tempdir_object = File::Temp->newdir; my $tempdir_name = $tempdir_object->dirname; diag "Temporary directory ",$tempdir_name; foreach my $filename ('http:', '-', '-###', '%57', 'a/b', # filename with "/" probably uncreatable ) { my $fullname = File::Spec->catfile($tempdir_name,$filename); SKIP: { # Don't attempt colon on Mac, MS-DOS, OS/2 etc where it's a volume # or directory separator. # # Cygwin translates ":" and other characters special to windows to # some unicode private chars, allowing them to be used # https://cygwin.com/cygwin-ug-net/using-specialnames.html # But that depends on the external program being a cygwin build # too, which is likely but not certain. # if ($filename =~ /:/ && ! $colon_is_ordinary) { skip "Cannot copy test file to $fullname: $!", 2; } # might be impossible to create a file with a slash like "a/b" File::Copy::copy($testfilename, $fullname) or skip "Cannot copy test file to $fullname: $!", 2; { my $str = $class->format_file ($fullname); like ($str, qr/body.*text/, "$class format_file() filename \"$fullname\""); } require Cwd; my $old_dir = Cwd::getcwd(); ($old_dir) = ($old_dir =~ /(.*)/); # untaint chdir $tempdir_name or die "Oops, cannot chdir to $tempdir_name"; { my $str = $class->format_file ($filename); like ($str, qr/body.*text/, "$class format_file() filename \"$filename\""); } chdir $old_dir or die "Oops, cannot chdir back to $old_dir"; } # actually File::Temp removes files in its temporary directory anyway unlink $fullname; } } } } exit 0; HTML-FormatExternal-26/t/wide.t0000644000175000017500000000654612515613461014153 0ustar gggg#!/usr/bin/perl -w # Copyright 2015 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use 5.006; use strict; use warnings; use FindBin; use File::Spec; use Test::More; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings() } if (defined $ENV{PATH}) { ($ENV{PATH}) = ($ENV{PATH} =~ /(.*)/); # untaint so programs can run } eval { my $str = ''; utf8::upgrade($str); utf8::is_utf8($str) } or plan skip_all => "due to no wide chars in this perl"; # A couple of the tests depend on w3m giving U+263A smiley face back as that # utf-8 character, or not as that character if asked for ascii output. # Should be good with any reasonably recent w3m. # require HTML::FormatText::W3m; my $class = 'HTML::FormatText::W3m'; diag $class; unless ($class->program_full_version) { plan skip_all => "due to $class program not available"; } plan tests => 13; my $smiley = 0x263A; #------------------------------------------------------------------------------ # format_string() { my $html = 'Hello '.chr($smiley).' '; ok (utf8::is_utf8($html), "input is wide"); my $text = $class->format_string ($html); ok (utf8::is_utf8($text), "format_string() wide input gives wide output"); like ($text, qr/[^[:ascii:]]/, "format_string() wide input gives wide output -- has non-ascii"); } { my $html = 'Hello'; ok (! utf8::is_utf8($html), "input not wide"); my $text = $class->format_string ($html, output_wide => 1); ok (utf8::is_utf8($text), "format_string() output_wide forced on"); } { my $html = 'Hello '.chr($smiley).' '; ok (utf8::is_utf8($html), "input is wide"); my $text = $class->format_string ($html, output_wide => 0); ok (! utf8::is_utf8($text), "format_string() output_wide forced off"); } { my $html = 'Hello '.chr($smiley).' '; ok (utf8::is_utf8($html), "input is wide"); my $text = $class->format_string ($html, output_charset => 'ascii'); ok (utf8::is_utf8($text), "format_string() wide but output_charset ascii"); like ($text, qr/^[[:ascii:]]*$/, "format_string() wide but output_charset ascii -- contain ascii only"); } #------------------------------------------------------------------------------ # format_file() my $testfilename = File::Spec->catfile($FindBin::Bin,'test.html'); { my $text = $class->format_file ($testfilename); ok (! utf8::is_utf8($text), "format_file() output not wide"); } { my $text = $class->format_file ($testfilename, output_wide => 1); ok (utf8::is_utf8($text), "format_file() output_wide forced on"); } { my $text = $class->format_file ($testfilename, output_wide => 0); ok (! utf8::is_utf8($text), "format_file() output_wide forced off"); } exit 0; HTML-FormatExternal-26/t/Elinks.t0000755000175000017500000000414412560646140014443 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Test::More tests => 10; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings() } require HTML::FormatText::Elinks; { my $want_version = 26; is ($HTML::FormatText::Elinks::VERSION, $want_version, 'VERSION variable'); is (HTML::FormatText::Elinks->VERSION, $want_version, 'VERSION class method'); ok (eval { HTML::FormatText::Elinks->VERSION($want_version); 1 }, "VERSION class check $want_version"); my $check_version = $want_version + 1000; ok (! eval { HTML::FormatText::Elinks->VERSION($check_version); 1 }, "VERSION class check $check_version"); my $formatter = HTML::FormatText::Elinks->new; is ($formatter->VERSION, $want_version, 'VERSION object method'); ok (eval { $formatter->VERSION($want_version); 1 }, "VERSION object check $want_version"); ok (! eval { $formatter->VERSION($check_version); 1 }, "VERSION object check $check_version"); } ## no critic (ProtectPrivateSubs) #----------------------------------------------------------------------------- # _quote_config_stringarg() foreach my $data (['', "''"], ['abc', "'abc'"], ["x'y'z", "'x\\'y\\'z'"], ) { my ($str, $want) = @$data; is (HTML::FormatText::Elinks::_quote_config_stringarg($str), $want, "_quote_config_stringarg() '$str'"); } exit 0; HTML-FormatExternal-26/t/MyTestHelpers.pm0000644000175000017500000002474511757337371016157 0ustar gggg# MyTestHelpers.pm -- my shared test script helpers # Copyright 2008, 2009, 2010, 2011, 2012 Kevin Ryde # MyTestHelpers.pm is shared by several distributions. # # MyTestHelpers.pm is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by the # Free Software Foundation; either version 3, or (at your option) any later # version. # # MyTestHelpers.pm is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . BEGIN { require 5 } package MyTestHelpers; use strict; use Exporter; use vars qw(@ISA @EXPORT_OK %EXPORT_TAGS); # uncomment this to run the ### lines #use Smart::Comments; @ISA = ('Exporter'); @EXPORT_OK = qw(findrefs main_iterations warn_suppress_gtk_icon glib_gtk_versions any_signal_connections nowarnings); %EXPORT_TAGS = (all => \@EXPORT_OK); sub DEBUG { 0 } #----------------------------------------------------------------------------- { my $warning_count; my $stacktraces; my $stacktraces_count = 0; sub nowarnings_handler { my ($msg) = @_; # don't error out for cpan alpha version number warnings unless (defined $msg && $msg =~ /^Argument "[0-9._]+" isn't numeric in numeric gt/) { $warning_count++; if ($stacktraces_count < 3 && eval { require Devel::StackTrace }) { $stacktraces_count++; $stacktraces .= "\n" . Devel::StackTrace->new->as_string() . "\n"; } } warn @_; } sub nowarnings { $SIG{'__WARN__'} = \&nowarnings_handler; } END { if ($warning_count) { MyTestHelpers::diag ("Saw $warning_count warning(s):"); if (defined $stacktraces) { MyTestHelpers::diag ($stacktraces); } else { MyTestHelpers::diag('(Devel::StackTrace not available for backtrace)'); } MyTestHelpers::diag ('Exit code 1 for warnings'); $? = 1; } } } sub diag { if (do { local $@; eval { Test::More->can('diag') }}) { Test::More::diag (@_); } else { my $msg = join('', map {defined($_)?$_:'[undef]'} @_)."\n"; $msg =~ s/^/# /mg; print STDERR $msg; } } sub dump { my ($thing) = @_; if (eval { require Data::Dumper; 1 }) { MyTestHelpers::diag (Data::Dumper::Dumper ($thing)); } else { MyTestHelpers::diag ("Data::Dumper not available"); } } #----------------------------------------------------------------------------- # Test::Weaken and other weaking sub findrefs { my ($obj) = @_; defined $obj or return; require Scalar::Util; if (ref $obj && Scalar::Util::reftype($obj) eq 'HASH') { MyTestHelpers::diag ("Keys: ", join(' ', map {"$_=".(defined $obj->{$_} ? "$obj->{$_}" : '[undef]')} keys %$obj)); } if (eval { require Devel::FindRef }) { MyTestHelpers::diag (Devel::FindRef::track($obj, 8)); } else { MyTestHelpers::diag ("Devel::FindRef not available -- ", $@); } } sub test_weaken_show_leaks { my ($leaks) = @_; $leaks || return; my $unfreed = $leaks->unfreed_proberefs; my $unfreed_count = scalar(@$unfreed); MyTestHelpers::diag ("Test-Weaken leaks $unfreed_count objects"); MyTestHelpers::dump ($leaks); my $proberef; foreach $proberef (@$unfreed) { MyTestHelpers::diag (" unfreed ", $proberef); } foreach $proberef (@$unfreed) { MyTestHelpers::diag ("search ", $proberef); MyTestHelpers::findrefs($proberef); } } #----------------------------------------------------------------------------- # Gtk/Glib helpers # Gtk 2.16 can go into a hard loop on events_pending() / main_iteration_do() # if dbus is not running, or something like that. In any case limiting the # iterations is good for test safety. # sub main_iterations { my $count = 0; if (DEBUG) { MyTestHelpers::diag ("main_iterations() ..."); } while (Gtk2->events_pending) { $count++; Gtk2->main_iteration_do (0); if ($count >= 500) { MyTestHelpers::diag ("main_iterations(): oops, bailed out after $count events/iterations"); return; } } MyTestHelpers::diag ("main_iterations(): ran $count events/iterations"); } # warn_suppress_gtk_icon() is a $SIG{__WARN__} handler which suppresses spam # from Gtk trying to make you buy the hi-colour icon theme. Eg, # # { # local $SIG{'__WARN__'} = \&MyTestHelpers::warn_suppress_gtk_icon; # $something = SomeThing->new; # } # sub warn_suppress_gtk_icon { my ($message) = @_; unless ($message =~ /Gtk-WARNING.*icon/ || $message =~ /\Qrecently-used.xbel/ ) { warn @_; } } sub glib_gtk_versions { my $gtk1_loaded = Gtk->can('init'); my $gtk2_loaded = Gtk2->can('init'); my $glib_loaded = Glib->can('get_home_dir'); if ($gtk1_loaded) { MyTestHelpers::diag ("Perl-Gtk1 version ",Gtk->VERSION); } if ($gtk2_loaded) { MyTestHelpers::diag ("Perl-Gtk2 version ",Gtk2->VERSION); } if ($glib_loaded) { # when loaded MyTestHelpers::diag ("Perl-Glib version ",Glib->VERSION); MyTestHelpers::diag ("Compiled against Glib version ", Glib::MAJOR_VERSION(), ".", Glib::MINOR_VERSION(), ".", Glib::MICRO_VERSION(), "."); MyTestHelpers::diag ("Running on Glib version ", Glib::major_version(), ".", Glib::minor_version(), ".", Glib::micro_version(), "."); } if ($gtk2_loaded) { MyTestHelpers::diag ("Compiled against Gtk version ", Gtk2::MAJOR_VERSION(), ".", Gtk2::MINOR_VERSION(), ".", Gtk2::MICRO_VERSION(), "."); MyTestHelpers::diag ("Running on Gtk version ", Gtk2::major_version(), ".", Gtk2::minor_version(), ".", Gtk2::micro_version(), "."); } if ($gtk1_loaded) { MyTestHelpers::diag ("Running on Gtk version ", Gtk->major_version(), ".", Gtk->minor_version(), ".", Gtk->micro_version(), "."); } } # Return true if there's any signal handlers connected to $obj. # # Signal IDs are from 1 up, don't pass 0 to signal_handler_is_connected() # since in Glib 2.4.1 it spits out a g_log() error. # sub any_signal_connections { my ($obj) = @_; my @connected = grep {$obj->signal_handler_is_connected ($_)} (1 .. 500); if (@connected) { my $connected = join(',',@connected); MyTestHelpers::diag ("$obj signal handlers connected: $connected"); return $connected; } return undef; } # wait for $signame to be emitted on $widget, with a timeout sub wait_for_event { my ($widget, $signame) = @_; if (DEBUG) { MyTestHelpers::diag ("wait_for_event() $signame on ",$widget); } my $done = 0; my $got_event = 0; my $sig_id = $widget->signal_connect ($signame => sub { if (DEBUG) { MyTestHelpers::diag ("wait_for_event() $signame received"); } $done = 1; return 0; # Gtk2::EVENT_PROPAGATE (new in Gtk2 1.220) }); my $timer_id = Glib::Timeout->add (30_000, # 30 seconds sub { $done = 1; MyTestHelpers::diag ("wait_for_event() oops, timeout waiting for $signame on ",$widget); return 1; # Glib::SOURCE_CONTINUE (new in Glib 1.220) }); if ($widget->can('get_display')) { # display new in Gtk 2.2 $widget->get_display->sync; } else { # in Gtk 2.0 gdk_flush() is a sync actually Gtk2::Gdk->flush; } my $count = 0; while (! $done) { if (DEBUG >= 2) { MyTestHelpers::diag ("wait_for_event() iteration $count"); } Gtk2->main_iteration; $count++; } MyTestHelpers::diag ("wait_for_event(): '$signame' ran $count events/iterations\n"); $widget->signal_handler_disconnect ($sig_id); Glib::Source->remove ($timer_id); } #----------------------------------------------------------------------------- # X11::Protocol helpers sub X11_chosen_screen_number { my ($X) = @_; my $i; foreach $i (0 .. $#{$X->{'screens'}}) { if ($X->{'screens'}->[$i]->{'root'} == $X->{'root'}) { return $i; } } die "Oops, current screen not found"; } sub X11_server_info { my ($X) = @_; MyTestHelpers::diag(""); MyTestHelpers::diag("X server info"); MyTestHelpers::diag("vendor: ",$X->{'vendor'}); MyTestHelpers::diag("release_number: ",$X->{'release_number'}); MyTestHelpers::diag("protocol_major_version: ",$X->{'protocol_major_version'}); MyTestHelpers::diag("protocol_minor_version: ",$X->{'protocol_minor_version'}); MyTestHelpers::diag("byte_order: ",$X->{'byte_order'}); MyTestHelpers::diag("num screens: ",scalar(@{$X->{'screens'}})); MyTestHelpers::diag("width_in_pixels: ",$X->{'width_in_pixels'}); MyTestHelpers::diag("height_in_pixels: ",$X->{'height_in_pixels'}); MyTestHelpers::diag("width_in_millimeters: ",$X->{'width_in_millimeters'}); MyTestHelpers::diag("height_in_millimeters: ",$X->{'height_in_millimeters'}); MyTestHelpers::diag("root_visual: ",$X->{'root_visual'}); my $visual_info = $X->{'visuals'}->{$X->{'root_visual'}}; MyTestHelpers::diag(" depth: ",$visual_info->{'depth'}); MyTestHelpers::diag(" class: ",$visual_info->{'class'}, ' ', $X->interp('VisualClass', $visual_info->{'class'})); MyTestHelpers::diag(" colormap_entries: ",$visual_info->{'colormap_entries'}); MyTestHelpers::diag(" bits_per_rgb_value: ",$visual_info->{'bits_per_rgb_value'}); MyTestHelpers::diag(" red_mask: ",sprintf('%#X',$visual_info->{'red_mask'})); MyTestHelpers::diag(" green_mask: ",sprintf('%#X',$visual_info->{'green_mask'})); MyTestHelpers::diag(" blue_mask: ",sprintf('%#X',$visual_info->{'blue_mask'})); MyTestHelpers::diag("ima"."ge_byte_order: ",$X->{'ima'.'ge_byte_order'}, ' ', $X->interp('Significance', $X->{'ima'.'ge_byte_order'})); MyTestHelpers::diag("black_pixel: ",sprintf('%#X',$X->{'black_pixel'})); MyTestHelpers::diag("white_pixel: ",sprintf('%#X',$X->{'white_pixel'})); foreach (0 .. $#{$X->{'screens'}}) { if ($X->{'screens'}->[$_]->{'root'} == $X->{'root'}) { MyTestHelpers::diag("chosen screen: $_"); } } MyTestHelpers::diag(""); } 1; __END__ HTML-FormatExternal-26/t/taint.t0000644000175000017500000000513112515613721014326 0ustar gggg#!/usr/bin/perl -w # Copyright 2015 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use 5.006; use strict; use warnings; use FindBin; use File::Spec; use Test::More; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings() } require HTML::FormatText::W3m; my $class = 'HTML::FormatText::W3m'; diag $class; eval { require Taint::Util } or plan skip_all => "due to Taint::Util not available"; { my $str = 'hello'; Taint::Util::taint($str); Taint::Util::tainted($str) or plan skip_all => "due to not running in perl -T taint mode"; } plan tests => 35; Taint::Util::untaint($ENV{PATH}); # so programs can run foreach my $class ('HTML::FormatText::Elinks', 'HTML::FormatText::Html2text', 'HTML::FormatText::Links', 'HTML::FormatText::Lynx', 'HTML::FormatText::Netrik', 'HTML::FormatText::W3m', 'HTML::FormatText::Zen', ) { diag $class; use_ok ($class); my $good_taint = sub { my ($str) = @_; return ! defined $str || Taint::Util::tainted($str) || $str eq '(not reported)'; # as from Netrik.pm }; { my $version = $class->program_full_version; ok ($good_taint->($version), 'program_full_version() should be tainted'); } { my $version = $class->program_version; ok ($good_taint->($version), 'program_version() should be tainted'); } my $have_program = defined($class->program_full_version); SKIP: { if (! $have_program) { skip "$class program not available", 2; } { my $str = $class->format_string ("

Hello

\n"); ok (Taint::Util::tainted($str), "format_string() tainted"); } { my $testfilename = File::Spec->catfile($FindBin::Bin,'test.html'); my $str = $class->format_file ($testfilename); ok (Taint::Util::tainted($str), "format_file() tainted"); } } } exit 0; HTML-FormatExternal-26/t/Links.t0000755000175000017500000000406312560646136014303 0ustar gggg#!/usr/bin/perl -w # Copyright 2008, 2009, 2010, 2013, 2015 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . use strict; use warnings; use Test::More tests => 9; use lib 't'; use MyTestHelpers; BEGIN { MyTestHelpers::nowarnings() } require HTML::FormatText::Links; { my $want_version = 26; is ($HTML::FormatText::Links::VERSION, $want_version, 'VERSION variable'); is (HTML::FormatText::Links->VERSION, $want_version, 'VERSION class method'); ok (eval { HTML::FormatText::Links->VERSION($want_version); 1 }, "VERSION class check $want_version"); my $check_version = $want_version + 1000; ok (! eval { HTML::FormatText::Links->VERSION($check_version); 1 }, "VERSION class check $check_version"); my $formatter = HTML::FormatText::Links->new; is ($formatter->VERSION, $want_version, 'VERSION object method'); ok (eval { $formatter->VERSION($want_version); 1 }, "VERSION object check $want_version"); ok (! eval { $formatter->VERSION($check_version); 1 }, "VERSION object check $check_version"); } ## no critic (ProtectPrivateSubs) #----------------------------------------------------------------------------- # _links_mung_charset() foreach my $data (['latin-1', 'latin1'], ['LATIN-2', 'LATIN2'], ) { my ($str, $want) = @$data; is (HTML::FormatText::Links::_links_mung_charset($str), $want, "_links_mung_charset() '$str'"); } exit 0; HTML-FormatExternal-26/README0000644000175000017500000000322412153725307013442 0ustar ggggCopyright 2009, 2013 Kevin Ryde This file is part of HTML-FormatExternal. HTML-FormatExternal is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version. HTML-FormatExternal is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with HTML-FormatExternal. If not, see . HTML-FormatExternal lets you turn HTML into plain text using one of the browsing/formatting programs, elinks http://elinks.cz/ html2text http://www.mbayer.de/html2text/ links http://links.twibright.com/ lynx http://lynx.isc.org/ netrik http://netrik.sourceforge.net/ vilistextum http://bhaak.dyndns.org/vilistextum/ w3m http://sourceforge.net/projects/w3m zen http://www.nocrew.org/software/zen/ The programming interface is compatible with HTML::FormatText and HTML::FormatText::WithLinks, so you can fairly easily switch how you want the formatting done. The programs vary in things like link printing style, levels of support for non-ascii input or output, table output, or HTML 4 constructs. The compatible programming interface means you can give a couple of them a try to find what you like most or are most familiar with etc. http://user42.tuxfamily.org/html-formatexternal/index.html HTML-FormatExternal-26/META.json0000644000175000017500000000301312570233261014173 0ustar gggg{ "abstract" : "HTML to text formatting using external programs.", "author" : [ "Kevin Ryde " ], "dynamic_config" : 1, "generated_by" : "ExtUtils::MakeMaker version 6.98, CPAN::Meta::Converter version 2.150005", "license" : [ "gpl_3" ], "meta-spec" : { "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec", "version" : "2" }, "name" : "HTML-FormatExternal", "no_index" : { "directory" : [ "t", "inc", "devel", "xt" ] }, "prereqs" : { "build" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "configure" : { "requires" : { "ExtUtils::MakeMaker" : "0" } }, "runtime" : { "requires" : { "File::Spec" : "0.8", "File::Temp" : "0.18", "IPC::Run" : "0", "URI::file" : "0.08", "constant::defer" : "0", "perl" : "5.006" } }, "test" : { "requires" : { "Test::More" : "0" }, "suggests" : { "HTML::TreeBuilder" : "0", "Taint::Util" : "0" } } }, "release_status" : "stable", "resources" : { "homepage" : "http://user42.tuxfamily.org/html-formatexternal/index.html", "license" : [ "http://www.gnu.org/licenses/gpl.html" ] }, "version" : "26", "x_serialization_backend" : "JSON::PP version 2.27203" } HTML-FormatExternal-26/META.yml0000644000175000017500000000152512570233261014031 0ustar gggg--- abstract: 'HTML to text formatting using external programs.' author: - 'Kevin Ryde ' build_requires: ExtUtils::MakeMaker: '0' Test::More: '0' configure_requires: ExtUtils::MakeMaker: '0' dynamic_config: 1 generated_by: 'ExtUtils::MakeMaker version 6.98, CPAN::Meta::Converter version 2.150005' license: gpl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: '1.4' name: HTML-FormatExternal no_index: directory: - t - inc - devel - xt requires: File::Spec: '0.8' File::Temp: '0.18' IPC::Run: '0' URI::file: '0.08' constant::defer: '0' perl: '5.006' resources: homepage: http://user42.tuxfamily.org/html-formatexternal/index.html license: http://www.gnu.org/licenses/gpl.html version: '26' x_serialization_backend: 'CPAN::Meta::YAML version 0.016' HTML-FormatExternal-26/SIGNATURE0000644000175000017500000001044612570233265014052 0ustar ggggThis file contains message digests of all files listed in MANIFEST, signed via the Module::Signature module, version 0.79. To verify the content in this distribution, first make sure you have Module::Signature installed, then type: % cpansign -v It will check each file's integrity, as well as the signature's validity. If "==> Signature verified OK! <==" is not displayed, the distribution may already have been compromised, and you should not run its Makefile.PL or Build.PL. -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 SHA1 842745cb706f8f2126506f544492f7a80dbe29b3 COPYING SHA1 b60301e01c4920d3bb9b05127265c613f7ad7dda Changes SHA1 e55c822eebd0cedfdfcff32b7eaa0d69c8d2b12c MANIFEST SHA1 d73f6910c2283dca6aa23a5d820950745dcb76f6 MANIFEST.SKIP SHA1 7e6340c659b1a3b2627fce31a7dde91ccfe96a91 META.json SHA1 009eb19db8e3de38af44033385fc371bcb9b25ac META.yml SHA1 6377f445cdab5c5c84fcbed54b4cf9d4bedfa0e4 Makefile.PL SHA1 5add845396738af399dd7808883a7fda1d4c80e8 README SHA1 acb7777c22cc3cea3d7b456d0c8e3d639690cadd debian/changelog SHA1 ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4 debian/compat SHA1 46814236aef21590d7a903d26717da1cdc2df162 debian/control SHA1 060b5f139132e105700a70176f9cd3e25be23513 debian/copyright SHA1 c1a523266cee9adfcc694ccd5187077066eb388f debian/rules SHA1 61652cd1568dcf2614df833eba241755eee34e89 debian/source/format SHA1 97b2a7bbf231ad7c6576c2120da6e217be4e98eb debian/watch SHA1 40aaa9775e1dc755939e88641412d535ae66c7ac devel/base-utf16.html SHA1 a4c6e4a53f16f587e8b05793b94cf0dd0dbe2e1d devel/base.html SHA1 043f4d3097f8cdcb9e7cc23b720ad6d03211a68e devel/element-format.pl SHA1 8abf536e7140fa7b5494c28c16e15465a6cf8ff9 devel/margin12.html SHA1 baa88351b5df57e74e42566676ef567c58a64f5e devel/output-charset.pl SHA1 340c6d269f1434e4ba39d53b84548f0450821ec3 devel/run.pl SHA1 48e454ace186317f76e0f466c2c54baa362db46a devel/slurp.pl SHA1 2af40e15b8891946e9a23d762137c0d0e6e19b04 devel/utf.pl SHA1 73ad4b5ef1db680ce6d417d3d044585068efc392 devel/wide.pl SHA1 5a4e96028a3ba555b32972a14d16c084e140c688 examples/demo.pl SHA1 2c31f38013695c85f0a75eb2c783989978aa298b inc/my_pod2html SHA1 3fd51ba4bdc736f72e287581c208786f42c126ed lib/HTML/FormatExternal.pm SHA1 2fe6c9c3b5bfbc5d026eb2e00b452b2538ebe171 lib/HTML/FormatText/Elinks.pm SHA1 804311a7c6b97311c7ed5ec0f362683cc53c4911 lib/HTML/FormatText/Html2text.pm SHA1 cf71bb6ebabc68dec470eea56b5425591c161a70 lib/HTML/FormatText/Links.pm SHA1 a10f4cdbdb45d599d8937170fab20382ee2dddaf lib/HTML/FormatText/Lynx.pm SHA1 b12b6abd08eaadf5a370aec2ef28a6cab2c64112 lib/HTML/FormatText/Netrik.pm SHA1 adce9e9dbea3543cede0197f78aa3c384e0c927f lib/HTML/FormatText/Vilistextum.pm SHA1 0fe01500f991348ae2185a46f869b5583b121319 lib/HTML/FormatText/W3m.pm SHA1 7dc6314817cab2964c5e22c029f718fabc0a4722 lib/HTML/FormatText/Zen.pm SHA1 e8a2275baca79d8f458cce09170da1d2c2b8651c t/Elinks.t SHA1 8ebd16f20ba737d55c1edc6d2cad47272eeb078e t/FormatExternal.t SHA1 25b65048806e411ff8ffd64748cccaf1c51afaf5 t/Links.t SHA1 3855de075da37f5bc4e6a5534b2f00cc14872a98 t/MyTestHelpers.pm SHA1 a64779ce2de9b19b81091a52beecb50bf46fe758 t/taint.t SHA1 0fb013243968897bbd52d598c020d1c6d580c01d t/test.html SHA1 b8b3fa6c4c55938ca8fbff6dd0df7bea900469c9 t/wide.t SHA1 f2ceb7bb4bf68bd24728b90918e63d251fd06183 xt/0-META-read.t SHA1 c9e7a47e1f397eb5f46d7e4ef21754d9df76e36e xt/0-Test-ConsistentVersion.t SHA1 d9e7f38a8dcfdb76c56951e4214be8eeda2e2289 xt/0-Test-DistManifest.t SHA1 9496e4a2a2734c7093ab6961900dd7a55800976e xt/0-Test-Pod.t SHA1 083242d959c1a9242c874982e4585872a95bec93 xt/0-Test-Synopsis.t SHA1 63abbc3297914cd38081e95588fd794f9d1f0306 xt/0-Test-YAML-Meta.t SHA1 1cafa4e23b7a691e09161c0927e6920dd663e2ce xt/0-examples-xrefs.t SHA1 4a897c9b09f37290f507d151e84b6a3c8defd496 xt/0-file-is-part-of.t SHA1 aeb6f41dfae96d04459448f6c3e3dee44691e722 xt/0-no-debug-left-on.t SHA1 4a96ea4a349a7b8364308e377f53d32092087f7b xt/More.t SHA1 e4a15f9b4bcf1d087de2d9ed8e87bf92b0469a6c xt/my-manifest.sh SHA1 cd18ddee73ad0007ba667f0e472ea9d73d0dfa89 xtools/my-check-copyright-years.sh SHA1 b12199bcc70c569ff43f8d8e2ea1c896f587c62f xtools/my-check-spelling.sh SHA1 475e8eb08eac6eecec94a0f9e9928af7d467d2d0 xtools/my-deb.sh SHA1 830acfc76ae9239a2c36143c437a3ec7c6736a0a xtools/my-pc.sh -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iD8DBQFV4TayLFMCIV9q3ToRAshQAJ9+/OwJPRfJXCKkIbyjG7bJC1fdagCfaBlc tMSvYYlL+ErFY8EMpzhh+BQ= =hbr2 -----END PGP SIGNATURE----- HTML-FormatExternal-26/MANIFEST0000644000175000017500000000233112570233262013706 0ustar ggggChanges COPYING debian/changelog debian/compat debian/control debian/copyright debian/rules debian/source/format debian/watch devel/base-utf16.html devel/base.html devel/element-format.pl devel/margin12.html devel/output-charset.pl devel/run.pl devel/slurp.pl devel/utf.pl devel/wide.pl examples/demo.pl inc/my_pod2html lib/HTML/FormatExternal.pm lib/HTML/FormatText/Elinks.pm lib/HTML/FormatText/Html2text.pm lib/HTML/FormatText/Links.pm lib/HTML/FormatText/Lynx.pm lib/HTML/FormatText/Netrik.pm lib/HTML/FormatText/Vilistextum.pm lib/HTML/FormatText/W3m.pm lib/HTML/FormatText/Zen.pm Makefile.PL MANIFEST This list of files MANIFEST.SKIP README SIGNATURE t/Elinks.t t/FormatExternal.t t/Links.t t/MyTestHelpers.pm t/taint.t t/test.html t/wide.t xt/0-examples-xrefs.t xt/0-file-is-part-of.t xt/0-META-read.t xt/0-no-debug-left-on.t xt/0-Test-ConsistentVersion.t xt/0-Test-DistManifest.t xt/0-Test-Pod.t xt/0-Test-Synopsis.t xt/0-Test-YAML-Meta.t xt/More.t xt/my-manifest.sh xtools/my-check-copyright-years.sh xtools/my-check-spelling.sh xtools/my-deb.sh xtools/my-pc.sh META.yml Module YAML meta-data (added by MakeMaker) META.json Module JSON meta-data (added by MakeMaker) HTML-FormatExternal-26/MANIFEST.SKIP0000644000175000017500000000604012241340502014443 0ustar gggg#!/usr/bin/perl # MANIFEST.SKIP -- Kevin's various excluded files # Copyright 2008, 2009, 2010, 2011, 2012, 2013 Kevin Ryde # This file is shared among several distributions. # # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3, or (at your option) # any later version. # # This file is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License along # with this file. If not, see . # cf. /usr/share/perl/5.14/ExtUtils/MANIFEST.SKIP # emacs backups ~$ # emacs locks (^|/)\.# # emacs autosave (^|/)# # own distdir ^[A-Za-z][A-Za-z0-9-_]*-\d+/ # own dist files \.tar$ \.tar\.gz$ \.deb$ # ExtUtils::MakeMaker leaving Makefile.old # and "myman" leaving MANIFEST.old \.old$ # ExtUtils::MakeMaker "metafile" rule temporary, left behind if interrupted ^META_new\.yml$ # built - MakeMaker ^Makefile$ ^blib ^pm_to_blib ^TAGS$ # MakeMaker 6.18 to 6.25, apparently ^blibdirs\.ts$ # msdos compiler output stuff using gcc, # "XSFILENAME.def" extension, and others fixed names it seems \.def$ ^dll\.base$ ^dll\.exp$ # msdos compiler stuff using ms c \.pdb$ # built - recent Module::Build nonsense ^MYMETA\.yml$ # built - something recent ExtUtils::MakeMaker ^MYMETA\.json$ # built - cdbs and debhelper ^debian/stamp- ^debian/.*\.log$ # built - texinfo.tex temporaries \.(aux|cp|cps|fn|fns|ky|log|pg|toc|tp|tps|vr)$ # built - PGF temporary \.pgf$ # tex or latex output, not distributed for now \.(dvi|pdf|ps)$ # toplevel .c files built from .xs ^[^/]+\.c$ # built .o compiled and .bs from .xs \.o$ \.obj$ \.bs$ # #(^|/)[A-Z][A-Za-z0-9_]*\.c$ #/[^Z])[^/]+\.c$ # built - configury ^a\.out$ ^conftest ^config\.h$ ^myconfig\.h$ # built - toplevel html pages ^[a-zA-Z][^./]+\.html$ # inc/MyMakefileExtras.pm "diff-prev" ^diff\.tmp # inc/MyMakefileExtras.pm "lintian-source" ^temp-lintian # various testing ^tempfile\.png$ ^tempfile \.tmp$ # my dists ^dist-deb ^up$ ^c$ # special case p..ulp test build stuff devel/h2xs/TestConstFoo/ # special case various executables ^devel/exe-[^/.]+$ # special case mall executables ^devel/hblk$ ^devel/mallopt$ ^devel/mallstats$ ^devel/malltrim$ # special case fli executables ^devel/mmap-multi$ # special case widget-bits executables ^devel/grandom$ ^devel/grandom-[a-z]+$ # special case widget-cursor executables ^devel/invisible-blank$ # special case x'or executables ^devel/gtk-gc-colour$ # special case combo executables ^devel/toolbutton-overflow-leak$ # special case xpother executables ^devel/encode-all$ ^devel/encode-dump$ ^devel/Encode-X11-xlib$ ^devel/Encode-X11-xlib2$ # special case xpother samples ^devel/encode.*\.ctext$ ^devel/encode.*\.utf8$ # special htmlext environs ^test-dist/ \.junk$ \.bak$ ^backup ^misc ^maybe ^samples ^samp ^formats HTML-FormatExternal-26/debian/0002755000175000017500000000000012570233261014001 5ustar ggggHTML-FormatExternal-26/debian/control0000644000175000017500000000422612515574470015417 0ustar gggg# Copyright 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as # published by the Free Software Foundation; either version 3, or (at # your option) any later version. # # HTML-FormatExternal is distributed in the hope that it will be # useful, but WITHOUT ANY WARRANTY; without even the implied warranty # of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with HTML-FormatExternal. If not, see # . # Build-Depends could have libtaint-util-perl per META.yml # "prereqs.test.suggests" but it isn't used unless the tests are run # under perl -T taint mode, which isn't usual for the debian build. # # Build-Depends could have "libhtml-tree-perl" for HTML::TreeBuilder # used by a couple of tests. The formatter programs are used by the # tests likewise. But demanding all that is a bit tedious for a # manual build, and in particular "zen" is not in debian as of Apr # 2015. Source: libhtml-formatexternal-perl Section: perl Priority: optional Build-Depends: cdbs, debhelper (>= 5) Maintainer: Kevin Ryde Standards-Version: 3.9.6 Homepage: http://user42.tuxfamily.org/html-formatexternal/index.html Bugs: mailto:user42_kevin@yahoo.com.au Package: libhtml-formatexternal-perl Architecture: all Depends: perl (>= 5.6), libconstant-defer-perl, libfile-temp-perl (>= 0.18) | perl (>= 5.8.9), libipc-run-perl, liburi-perl (>= 0.08), ${perl:Depends}, ${misc:Depends} Recommends: lynx | elinks | links | netrik | vilistextum | w3m | zen Suggests: elinks, links, lynx, netrik, vilistextum, w3m, zen Description: Format HTML to text in perl using external programs HTML::FormatExternal turns HTML into plain text by running it through one of elinks, html2text, links, lynx, netrik, vilistextum, w3m or zen. . The programming interface is compatible with HTML::FormatText and other HTML::Formatter classes. HTML-FormatExternal-26/debian/watch0000644000175000017500000000176211236733627015047 0ustar gggg# Web site version watch, for devscripts uscan program. # Copyright 2008, 2009 Kevin Ryde # # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as # published by the Free Software Foundation; either version 3, or (at # your option) any later version. # # HTML-FormatExternal is distributed in the hope that it will be # useful, but WITHOUT ANY WARRANTY; without even the implied warranty # of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with HTML-FormatExternal. If not, see # . # Crib notes: # "man uscan" describes the format. # test with: uscan --report --verbose --watchfile=debian/watch version=3 http://user42.tuxfamily.org/html-formatexternal/index.html \ .*/HTML-FormatExternal-([0-9]+)\.tar\.gz HTML-FormatExternal-26/debian/source/0002755000175000017500000000000012570233261015301 5ustar ggggHTML-FormatExternal-26/debian/source/format0000644000175000017500000000000411357715142016511 0ustar gggg1.0 HTML-FormatExternal-26/debian/compat0000644000175000017500000000000111155332372015175 0ustar gggg5HTML-FormatExternal-26/debian/copyright0000644000175000017500000000057312515553763015752 0ustar ggggHTML-FormatExternal is Copyright 2008, 2009, 2010, 2011, 2013, 2015 Kevin Ryde HTML-FormatExternal is licensed under the GNU General Public License GPL version 3 or at your option any later version. The complete text of GPL version 3 is in the file /usr/share/common-licenses/GPL-3 The program home page is http://user42.tuxfamily.org/html-formatexternal/index.html HTML-FormatExternal-26/debian/changelog0000644000175000017500000000023712560646141015657 0ustar gggglibhtml-formatexternal-perl (26-0.1) unstable; urgency=low * Packaged version. -- Kevin Ryde Thu, 06 Aug 2015 22:00:27 +1000 HTML-FormatExternal-26/debian/rules0000755000175000017500000000160412374741317015070 0ustar gggg#!/usr/bin/make -f # Copyright 2008, 2014 Kevin Ryde # This file is part of HTML-FormatExternal. # # HTML-FormatExternal is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 3, or (at your option) any # later version. # # HTML-FormatExternal is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY # or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License # for more details. # # You should have received a copy of the GNU General Public License along # with HTML-FormatExternal. If not, see . include /usr/share/cdbs/1/rules/debhelper.mk include /usr/share/cdbs/1/class/perl-makemaker.mk DEB_INSTALL_EXAMPLES_libhtml-formatexternal-perl = examples/*