HTML-FormatText-WithLinks-AndTables-0.07000755001752001752 013025101740 17036 5ustar00daledale000000000000LICENSE100644001752001752 2135213025101740 20147 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07 The Artistic License 2.0 Copyright (c) 2008-2015 Shaun Fryer, 2015 Dale Evans Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble This license establishes the terms under which a given free software Package may be copied, modified, distributed, and/or redistributed. The intent is that the Copyright Holder maintains some artistic control over the development of that Package while still keeping the Package available as open source and free software. You are always permitted to make arrangements wholly outside of this license directly with the Copyright Holder of a given Package. If the terms of this license do not permit the full use that you propose to make of the Package, you should contact the Copyright Holder and seek a different licensing arrangement. Definitions "Copyright Holder" means the individual(s) or organization(s) named in the copyright notice for the entire Package. "Contributor" means any party that has contributed code or other material to the Package, in accordance with the Copyright Holder's procedures. "You" and "your" means any person who would like to copy, distribute, or modify the Package. "Package" means the collection of files distributed by the Copyright Holder, and derivatives of that collection and/or of those files. A given Package may consist of either the Standard Version, or a Modified Version. "Distribute" means providing a copy of the Package or making it accessible to anyone else, or in the case of a company or organization, to others outside of your company or organization. "Distributor Fee" means any fee that you charge for Distributing this Package or providing support for this Package to another party. It does not mean licensing fees. "Standard Version" refers to the Package if it has not been modified, or has been modified only in ways explicitly requested by the Copyright Holder. "Modified Version" means the Package, if it has been changed, and such changes were not explicitly requested by the Copyright Holder. "Original License" means this Artistic License as Distributed with the Standard Version of the Package, in its current version or as it may be modified by The Perl Foundation in the future. "Source" form means the source code, documentation source, and configuration files for the Package. "Compiled" form means the compiled bytecode, object code, binary, or any other form resulting from mechanical transformation or translation of the Source form. Permission for Use and Modification Without Distribution (1) You are permitted to use the Standard Version and create and use Modified Versions for any purpose without restriction, provided that you do not Distribute the Modified Version. Permissions for Redistribution of the Standard Version (2) You may Distribute verbatim copies of the Source form of the Standard Version of this Package in any medium without restriction, either gratis or for a Distributor Fee, provided that you duplicate all of the original copyright notices and associated disclaimers. At your discretion, such verbatim copies may or may not include a Compiled form of the Package. (3) You may apply any bug fixes, portability changes, and other modifications made available from the Copyright Holder. The resulting Package will still be considered the Standard Version, and as such will be subject to the Original License. Distribution of Modified Versions of the Package as Source (4) You may Distribute your Modified Version as Source (either gratis or for a Distributor Fee, and with or without a Compiled form of the Modified Version) provided that you clearly document how it differs from the Standard Version, including, but not limited to, documenting any non-standard features, executables, or modules, and provided that you do at least ONE of the following: (a) make the Modified Version available to the Copyright Holder of the Standard Version, under the Original License, so that the Copyright Holder may include your modifications in the Standard Version. (b) ensure that installation of your Modified Version does not prevent the user installing or running the Standard Version. In addition, the Modified Version must bear a name that is different from the name of the Standard Version. (c) allow anyone who receives a copy of the Modified Version to make the Source form of the Modified Version available to others under (i) the Original License or (ii) a license that permits the licensee to freely copy, modify and redistribute the Modified Version using the same licensing terms that apply to the copy that the licensee received, and requires that the Source form of the Modified Version, and of any works derived from it, be made freely available in that license fees are prohibited but Distributor Fees are allowed. Distribution of Compiled Forms of the Standard Version or Modified Versions without the Source (5) You may Distribute Compiled forms of the Standard Version without the Source, provided that you include complete instructions on how to get the Source of the Standard Version. Such instructions must be valid at the time of your distribution. If these instructions, at any time while you are carrying out such distribution, become invalid, you must provide new instructions on demand or cease further distribution. If you provide valid instructions or cease distribution within thirty days after you become aware that the instructions are invalid, then you do not forfeit any of your rights under this license. (6) You may Distribute a Modified Version in Compiled form without the Source, provided that you comply with Section 4 with respect to the Source of the Modified Version. Aggregating or Linking the Package (7) You may aggregate the Package (either the Standard Version or Modified Version) with other packages and Distribute the resulting aggregation provided that you do not charge a licensing fee for the Package. Distributor Fees are permitted, and licensing fees for other components in the aggregation are permitted. The terms of this license apply to the use and Distribution of the Standard or Modified Versions as included in the aggregation. (8) You are permitted to link Modified and Standard Versions with other works, to embed the Package in a larger work of your own, or to build stand-alone binary or bytecode versions of applications that include the Package, and Distribute the result without restriction, provided the result does not expose a direct interface to the Package. Items That are Not Considered Part of a Modified Version (9) Works (including, but not limited to, modules and scripts) that merely extend or make use of the Package, do not, by themselves, cause the Package to be a Modified Version. In addition, such works are not considered parts of the Package itself, and are not subject to the terms of this license. General Provisions (10) Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license. (11) If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license. (12) This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder. (13) This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed. (14) Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Changes100755001752001752 204513025101740 20416 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07Revision history for HTML-FormatText-WithLinks-AndTables 0.07 Fri 16 Dec 2016 Strips \240 characters created by HTML::Formatter when it encounters a   0.06 Tue 21 May 2015 Fixed usage of Dist::Zilla::Plugin::OurPkgVersion so that MetaCPAN will hopefully accept this version Fixed issue with custom formatting parameters not being passed around properly. 0.05 Tue 21 May 2015 Fixed bug with empty tags. 0.04 Tue 19 May 2015 Documentation fix. 0.03 Tue 19 May 2015 New maintainer, Dale Evans http://search.cpan.org/~daleevans/ Handle table headers by treating them like (patch from Alex Aminoff, NBER) Some typos fixed (patch from Fabrizio Regalli) Handle empty table rows without crashing Return () when called in an array context on undefined HTML Converted to Dist::Zilla 0.02 Thu 7 Jun 2012 Modified so content text of "0" prints properly 0.01 Wed 10 Dec 2008 First version, released on an unsuspecting world. t000755001752001752 013025101740 17222 5ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07pod.t100755001752001752 35013025101740 20312 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!perl -T use strict; use warnings; use Test::More; # Ensure a recent version of Test::Pod my $min_tp = 1.22; eval "use Test::Pod $min_tp"; plan skip_all => "Test::Pod $min_tp required for testing POD" if $@; all_pod_files_ok(); dist.ini100644001752001752 123113025101740 20560 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07name = HTML-FormatText-WithLinks-AndTables copyright_holder = Shawn Fryer, Dale Evans version = 0.07 author = Shaun Fryer , Dale Evans [GatherDir] [PruneCruft] [ManifestSkip] [MetaYAML] [ExtraTests] [ExecDir] [ShareDir] [MakeMaker] [Manifest] [TestRelease] [ConfirmRelease] [UploadToCPAN] [Repository] repository = https://github.com/daleevans/HTML-FormatText-WithLinks-AndTables [OurPkgVersion] [PodVersion] [PodSyntaxTests] [PodCoverageTests] [Test::Perl::Critic] [Prereqs] HTML::FormatText = 0 Test::More = 0 HTML::TreeBuilder = 0 HTML::FormatText::WithLinks = 0 META.yml100644001752001752 133713025101740 20374 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07--- abstract: 'Converts HTML to Text with tables intact' author: - 'Shaun Fryer , Dale Evans ' build_requires: {} configure_requires: ExtUtils::MakeMaker: '0' dynamic_config: 0 generated_by: 'Dist::Zilla version 6.008, CPAN::Meta::Converter version 2.150001' license: perl meta-spec: url: http://module-build.sourceforge.net/META-spec-v1.4.html version: '1.4' name: HTML-FormatText-WithLinks-AndTables requires: HTML::FormatText: '0' HTML::FormatText::WithLinks: '0' HTML::TreeBuilder: '0' Test::More: '0' resources: repository: https://github.com/daleevans/HTML-FormatText-WithLinks-AndTables version: '0.07' x_serialization_backend: 'YAML::Tiny version 1.66' MANIFEST100644001752001752 64013025101740 20230 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07# This file was automatically generated by Dist::Zilla::Plugin::Manifest v6.008. Changes LICENSE MANIFEST META.yml Makefile.PL README.pod dist.ini lib/HTML/FormatText/WithLinks/AndTables.pm t/00-load.t t/author-critic.t t/author-pod-coverage.t t/author-pod-syntax.t t/boilerplate.t t/empty_row.t t/empty_td.t t/empty_td_warning.t t/html-formattext-withlinks-andtables.t t/pod.t t/preserve_options.t t/table-header.t README.pod100644001752001752 744213025101740 20567 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07=head1 NAME HTML::FormatText::WithLinks::AndTables - Converts HTML to Text with tables intact =head1 SYNOPSIS use HTML::FormatText::WithLinks::AndTables; my $text = HTML::FormatText::WithLinks::AndTables->convert($html); Or optionally... my $conf = { # same as HTML::FormatText excepting below cellpadding => 2, # defaults to 1 no_rowspacing => 1, # bool, suppress vertical space between table rows }; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, $conf); =head1 DESCRIPTION This module was inspired by HTML::FormatText::WithLinks which has proven to be a useful `lynx -dump` work-alike. However one frustration was that no other HTML converters I came across had the ability to deal affectively with HTML s. This module can in a rudimentary sense do so. The aim was to provide facility to take a simple HTML based email template, and to also convert it to text with the
structure intact for inclusion as "multipart/alternative" content. Further, it will preserve both the formatting specified by the
tag's "align" attribute, and will also preserve multiline text inside of a element provided it is broken using
tags. =head2 EXPORT None by default. =head1 METHODS =head2 convert =head1 EXAMPLE Given the HTML below ...
Name: Mr. Foo Bar
Address: #1-276 Quux Lane,
Schenectady, NY, USA,
12345
Email: foo@bar.baz
... the (default) return value of convert() will be as follows. Name: Mr. Foo Bar Address: #1-276 Quux Lane, Schenectady, NY, USA, 12345 Email: [1]foo@bar.baz 1. mailto:foo@bar.baz =head1 SEE ALSO HTML::FormatText::WithLinks HTML::TreeBuilder =head1 CAVEATS *
elements are treated identically to elements * It assumes a fixed width font for display of resulting text. * It doesn't work well on nested s or other nested blocks within
s. =head1 AUTHOR Shaun Fryer, C<< >> (author emeritus) Dale Evans, C<< >> (current maintainer) =head1 BUGS Please report any bugs or feature requests to C, or through the web interface at L. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. =head1 SUPPORT You can find documentation for this module with the perldoc command. perldoc HTML::FormatText::WithLinks::AndTables You can also look for information at: =over 4 =item * RT: CPAN's request tracker L =item * AnnoCPAN: Annotated CPAN documentation L =item * CPAN Ratings L =item * Search CPAN L =back =head1 ACKNOWLEDGEMENTS Everybody. :) L =head1 COPYRIGHT & LICENSE Copyright 2008 Shaun Fryer, all rights reserved. Copyright 2015 Dale Evans, all rights reserved This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =for Pod::Coverage configure 00-load.t100755001752001752 34013025101740 20663 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!perl -T use Test::More tests => 1; BEGIN { use_ok( 'HTML::FormatText::WithLinks::AndTables' ); } diag( "Testing HTML::FormatText::WithLinks::AndTables $HTML::FormatText::WithLinks::AndTables::VERSION, Perl $], $^X" ); Makefile.PL100644001752001752 227713025101740 21101 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07# This file was automatically generated by Dist::Zilla::Plugin::MakeMaker v6.008. use strict; use warnings; use ExtUtils::MakeMaker; my %WriteMakefileArgs = ( "ABSTRACT" => "Converts HTML to Text with tables intact", "AUTHOR" => "Shaun Fryer , Dale Evans ", "CONFIGURE_REQUIRES" => { "ExtUtils::MakeMaker" => 0 }, "DISTNAME" => "HTML-FormatText-WithLinks-AndTables", "LICENSE" => "perl", "NAME" => "HTML::FormatText::WithLinks::AndTables", "PREREQ_PM" => { "HTML::FormatText" => 0, "HTML::FormatText::WithLinks" => 0, "HTML::TreeBuilder" => 0, "Test::More" => 0 }, "VERSION" => "0.07", "test" => { "TESTS" => "t/*.t" } ); my %FallbackPrereqs = ( "HTML::FormatText" => 0, "HTML::FormatText::WithLinks" => 0, "HTML::TreeBuilder" => 0, "Test::More" => 0 ); unless ( eval { ExtUtils::MakeMaker->VERSION(6.63_03) } ) { delete $WriteMakefileArgs{TEST_REQUIRES}; delete $WriteMakefileArgs{BUILD_REQUIRES}; $WriteMakefileArgs{PREREQ_PM} = \%FallbackPrereqs; } delete $WriteMakefileArgs{CONFIGURE_REQUIRES} unless eval { ExtUtils::MakeMaker->VERSION(6.52) }; WriteMakefile(%WriteMakefileArgs); empty_td.t100644001752001752 55013025101740 21354 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!/usr/bin/perl use HTML::FormatText::WithLinks::AndTables; use Test::More tests => 1; my $html ='
 
'; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, {rm=>80,cellpadding=>2}); #print "got: '$text'\n"; #print "expected: '$expected'\n"; ok($text =~ /^\s+$/s,"blank output, no token strings"); empty_row.t100644001752001752 67313025101740 21562 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!/usr/bin/perl use HTML::FormatText::WithLinks::AndTables; use Test::More tests => 1; my $html ='
cell 1 cell 2
'; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, {rm=>80,cellpadding=>2}); my $expected = ' cell 1 cell 2 '; #print "got: '$text'\n"; #print "expected: '$expected'\n"; ok($expected eq $text,"table header displayed"); boilerplate.t100755001752001752 250313025101740 22054 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!perl -T use strict; use warnings; use Test::More tests => 3; sub not_in_file_ok { my ($filename, %regex) = @_; open( my $fh, '<', $filename ) or die "couldn't open $filename for reading: $!"; my %violated; while (my $line = <$fh>) { while (my ($desc, $regex) = each %regex) { if ($line =~ $regex) { push @{$violated{$desc}||=[]}, $.; } } } if (%violated) { fail("$filename contains boilerplate text"); diag "$_ appears on lines @{$violated{$_}}" for keys %violated; } else { pass("$filename contains no boilerplate text"); } } sub module_boilerplate_ok { my ($module) = @_; not_in_file_ok($module => 'the great new $MODULENAME' => qr/ - The great new /, 'boilerplate description' => qr/Quick summary of what the module/, 'stub function definition' => qr/function[12]/, ); } TODO: { local $TODO = "Need to replace the boilerplate text"; not_in_file_ok("README.pod" => "The README is used..." => qr/The README is used/, "'version information here'" => qr/to provide version information/, ); not_in_file_ok(Changes => "placeholder date/time" => qr(Date/time) ); module_boilerplate_ok('lib/HTML/FormatText/WithLinks/AndTables.pm'); } table-header.t100644001752001752 57313025101740 22051 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!/usr/bin/perl use HTML::FormatText::WithLinks::AndTables; use Test::More tests => 1; my $html ='
header1
'; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, {rm=>80,cellpadding=>2}); my $expected = ' header1 '; #print "expected: $expected\n"; #print "got: $text\n"; ok($expected eq $text,"table header displayed"); author-critic.t100644001752001752 63713025101740 22312 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!perl BEGIN { unless ($ENV{AUTHOR_TESTING}) { print qq{1..0 # SKIP these tests are for testing by the author\n}; exit } } use strict; use warnings; use Test::More; use English qw(-no_match_vars); eval "use Test::Perl::Critic"; plan skip_all => 'Test::Perl::Critic required to criticise code' if $@; Test::Perl::Critic->import( -profile => "perlcritic.rc" ) if -e "perlcritic.rc"; all_critic_ok(); empty_td_warning.t100644001752001752 60113025101740 23076 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!/usr/bin/perl use HTML::FormatText::WithLinks::AndTables; use Test::More tests => 1; my $html ='
'; { my @warnings; local $SIG{__WARN__} = sub { push @warnings, @_; }; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, {rm=>80,cellpadding=>2}); ok(scalar(@warnings == 0),"no warnings from empty cell"); } preserve_options.t100644001752001752 122713025101740 23157 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!/usr/bin/perl use HTML::FormatText::WithLinks::AndTables; use Test::More tests => 1; my $html = ' Link
Cell link
'; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, { footnote => '', after_link => ' (%l)', before_link => '', leftmargin => 0, }); print "got: '$text'\n"; my $expected = ' Link (http://example.com) Cell link (http://example.com/foo) '; #print "expected: '$expected'\n"; ok($text eq $expected,"links inside tables have after_link applied properly"); author-pod-syntax.t100644001752001752 45413025101740 23140 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!perl BEGIN { unless ($ENV{AUTHOR_TESTING}) { print qq{1..0 # SKIP these tests are for testing by the author\n}; exit } } # This file was automatically generated by Dist::Zilla::Plugin::PodSyntaxTests. use strict; use warnings; use Test::More; use Test::Pod 1.41; all_pod_files_ok(); author-pod-coverage.t100644001752001752 53613025101740 23406 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/t#!perl BEGIN { unless ($ENV{AUTHOR_TESTING}) { print qq{1..0 # SKIP these tests are for testing by the author\n}; exit } } # This file was automatically generated by Dist::Zilla::Plugin::PodCoverageTests. use Test::Pod::Coverage 1.08; use Pod::Coverage::TrustPod; all_pod_coverage_ok({ coverage_class => 'Pod::Coverage::TrustPod' }); html-formattext-withlinks-andtables.t100755001752001752 645513025101740 26670 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/tuse strict; use HTML::FormatText::WithLinks::AndTables; my $html = q|

BIG

How's it going?
How's it going?
How's it going?
How's it going?
How's it going?
How's it going?

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec hendrerit venenatis dolor. Suspendisse in neque id odio auctor porttitor. Cras adipiscing, orci in venenatis semper, nibh tortor posuere magna, ac blandit sapien purus et ligula. Proin et libero. Duis pellentesque, tellus a viverra pretium, lacus urna fermentum elit, et tempor nibh urna ac erat. Sed suscipit, enim in vulputate aliquam, mi ligula viverra enim, vitae mollis tortor metus ac sapien. Maecenas risus ligula, viverra eget, sagittis at, ultrices in, ante. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nunc dictum. Praesent gravida neque quis odio. Mauris lacus nulla, iaculis eu, commodo sit amet, molestie sed, diam. In vel ligula.
#1###
#2#########
#3###
#4###
#5###
%1%%%
%2%%%
%3%%%
%4%%%
%5%%%%%%%%%%%
booo!
#6###
#7#########
#8###
#9###
#10##
%6%%%
%7%%%
%8%%%
%9%%%
%10%%%%%%%%%%
http://xyz |; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, {rm=>80,cellpadding=>2}); my $expected = q| BIG === How's it going? How's it going? How's it going? How's it going? How's it going? How's it going? Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec hendrerit venenatis dolor. Suspendisse in neque id odio auctor porttitor. Cras adipiscing, orci in venenatis semper, nibh tortor posuere magna, ac blandit sapien purus et ligula. Proin et libero. Duis pellentesque, tellus a viverra pretium, lacus urna fermentum elit, et tempor nibh urna ac erat. Sed suscipit, enim in vulputate aliquam, mi ligula viverra enim, vitae mollis tortor metus ac sapien. Maecenas risus ligula, viverra eget, sagittis at, ultrices in, ante. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nunc dictum. Praesent gravida neque quis odio. Mauris lacus nulla, iaculis eu, commodo sit amet, molestie sed, diam. In vel ligula. #1### %1%%% #2######### %2%%% #3### %3%%% #4### %4%%% #5### %5%%%%%%%%%%% booo! #6### %6%%% #7######### %7%%% #8### %8%%% #9### %9%%% #10## %10%%%%%%%%%% [1]http://xyz 1. http://xyz |; use Test::More tests=>1; is $text, $expected, 'HTML::FormatText::WithLinks::AndTables->convert($html,{rm=>80,cellpadding=>2})'; WithLinks000755001752001752 013025101740 24302 5ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/lib/HTML/FormatTextAndTables.pm100755001752001752 3117613025101740 26670 0ustar00daledale000000000000HTML-FormatText-WithLinks-AndTables-0.07/lib/HTML/FormatText/WithLinkspackage HTML::FormatText::WithLinks::AndTables; use strict; use warnings; our $VERSION = '0.07'; # VERSION use base 'HTML::FormatText::WithLinks'; use HTML::TreeBuilder; ################################################################################ # configuration defaults ################################################################################ my $cellpadding = 1; # number of horizontal spaces to pad interior of
cells my $no_rowspacing = 0; # boolean, suppress space between table rows and rows with empty s ################################################################################ =head1 NAME HTML::FormatText::WithLinks::AndTables - Converts HTML to Text with tables intact =head1 VERSION version 0.07 =cut =head1 SYNOPSIS use HTML::FormatText::WithLinks::AndTables; my $text = HTML::FormatText::WithLinks::AndTables->convert($html); Or optionally... my $conf = { # same as HTML::FormatText excepting below cellpadding => 2, # defaults to 1 no_rowspacing => 1, # bool, suppress vertical space between table rows }; my $text = HTML::FormatText::WithLinks::AndTables->convert($html, $conf); =head1 DESCRIPTION This module was inspired by HTML::FormatText::WithLinks which has proven to be a useful `lynx -dump` work-alike. However one frustration was that no other HTML converters I came across had the ability to deal affectively with HTML s. This module can in a rudimentary sense do so. The aim was to provide facility to take a simple HTML based email template, and to also convert it to text with the
structure intact for inclusion as "multipart/alternative" content. Further, it will preserve both the formatting specified by the
tag's "align" attribute, and will also preserve multiline text inside of a element provided it is broken using
tags. =head2 EXPORT None by default. =head1 METHODS =head2 convert =cut my $parser_indent = 3; # HTML::FormatText::WithLinks adds this indent to data my $conf_defaults = {}; # the one and only public interface sub convert { shift if $_[0] eq __PACKAGE__; # to make it function friendly my ($html, $conf) = @_; # over-ride our defaults if ($conf and ref $conf eq 'HASH') { $no_rowspacing = $$conf{no_rowspacing} if $$conf{no_rowspacing}; delete $$conf{no_rowspacing}; $cellpadding = $$conf{cellpadding} if $$conf{cellpadding}; delete $$conf{cellpadding}; %$conf_defaults = (%$conf_defaults, %$conf); } return __PACKAGE__->new->parse($html); } # sub-class configure sub configure { # SUPER::configure actually modifies the hash, so we need to pass a copy my %configure = %$conf_defaults; shift()->SUPER::configure(\%configure); } # sub-class parse sub parse { my $self = shift; my $html = shift; return unless defined $html; return '' if $html eq ''; my $tree = HTML::TreeBuilder->new->parse( $html ); return $self->_format_tables( $tree ); # we work our magic... } # a private method sub _format_tables { my $self = shift; my $tree = shift; my $formatted_tables = []; # a nested stack for our formatted table text # the result of an all night programming session... # # essentially we take two passes over each table # and modify the structure of text and html by replacing in each
content with tokens # then replacing the tokens after _parse() has converted it to text # # for each
... # we grab all it's
inner text (and/or parsed html), rearrange it into a # single string of formatted text, and put a token into it's first # once we have processed the html with _parse(), we replace the tokens with the # corresponding formatted text my @tables = $tree->look_down(_tag=>'table'); my $table_count = 0; for my $table (@tables) { $formatted_tables->[$table_count] = []; my @trs = $table->look_down(_tag=>'tr'); my @max_col_width; # max column widths by index my @max_col_heights; # max column heights (for multi-line text) by index my @col_lines; # a stack for our redesigned rows of column () text FIRST_PASS: { my $row_count = 0; # obviously a counter... for my $tr (@trs) { # *** 1st pass over rows $max_col_heights[$row_count] = 0; $col_lines[$row_count] = []; my @cols = $tr->look_down(_tag=>qr/^(td|th)$/); # no support for . sorry. for (my $i = 0; $i < scalar @cols; $i++) { my $td = $cols[$i]->clone; my $new_tree = HTML::TreeBuilder->new; $new_tree->{_content} = [ $td ]; # parse the contents of the td into text # this doesn't work well with nested tables... my $text = __PACKAGE__->new->_parse($new_tree); # we don't want leading or tailing whitespace $text =~ s/\xA0+/ /s; #   -> space $text =~ s/^\s+//s; $text =~ s/\s+\z//s; # now we figure out the maximum widths and heights needed for each column my $max_line_width = 0; my @lines = split "\n", $text; # take the parsed text and break it into virtual rows $max_col_heights[$row_count] = scalar @lines if scalar @lines > $max_col_heights[$row_count]; for my $line (@lines) { my $line_width = length $line; $max_line_width = $line_width if $line_width > $max_line_width; } $cols[$i]->{_content} = [ $text ]; $max_col_width[$i] ||= 0; $max_col_width[$i] = $max_line_width if $max_line_width > $max_col_width[$i]; # now put the accumulated lines onto our stack $col_lines[$row_count]->[$i] = \@lines; } $tr->{_content} = \@cols; $row_count++; } } SECOND_PASS: { my $row_count = 0; # obviously, another counter... for my $tr (@trs) { # *** 2nd pass over rows my @cols = $tr->look_down(_tag=>qr/^(td|th)$/); # no support for . sorry. my $row_text; # the final string representing each row of reformatted text my @col_rows; # a stack for each virtual $new_line spliced together from a group of 's # iterate over each column of the maximum rows of parsed multiline text per # for each virtual row of each virtual column, concat the text with alignment spacings # the final concatinated string value will be placed in column 0 for (my $j = 0; $j < $max_col_heights[$row_count]; $j++) { my $new_line; for (my $i = 0; $i < scalar @cols; $i++) { # here are the actual elements we're iterating over... my $width = $max_col_width[$i] + $cellpadding; # how wide is this column of text my $line = $col_lines[$row_count]->[$i]->[$j]; # get the text to fit into it $line = defined $line ? $line : ''; # strip the whitespace from beginning and end of each line $line =~ s/^\s+//gs; $line =~ s/\s+\z//gs; my $n_space = $width - length $line; # the difference between the column and text widths # we are creating virtual rows of text within a single # so we need to add an indent to all but the first row to # match the indent added by _parse() for presenting table contents $line = ((' ')x$parser_indent). $line if $j != 0 and $i == 0; # here we adjust the text alignment by wrapping the text in occulted whitespace my $justify = $cols[$i]->tag eq 'td' ? ( $cols[$i]->attr('align') || 'left' ) : 'center'; if ($justify eq 'center') { my $pre = int( ($n_space + $cellpadding) / 2 ); # divide remaining space in half my $post = $n_space - $pre; # assign any uneven remainder to the end $new_line .= ((' ')x$pre). $line .((' ')x$post); # wrap the text in spaces } elsif ($justify eq 'left') { $new_line .= ((' ')x$cellpadding). $line .((' ')x$n_space); } else { $new_line .= ((' ')x$n_space). $line .((' ')x$cellpadding); } } $new_line .= "\n" if $j != $max_col_heights[$row_count] - 1; # add a newline to all but the last text row $col_rows[$j] = $new_line; # put the line into the stack for this row } $row_text .= $_ for @col_rows; for (my $i = 1; $i < scalar @cols; $i++) { $cols[$i]->delete; # get rid of unneeded 's } # put the fully formatted text into our accumulator $formatted_tables->[$table_count]->[$row_count] = $row_text; if (scalar @cols) { $cols[0]->content->[0] = "__TOKEN__${table_count}__${row_count}__"; # place a token into the row at col 0 } $row_count++; } } $table_count++; } # now replace our tokens my $text = $self->_parse( $tree ); for (my $i = 0; $i < scalar @$formatted_tables; $i++) { for (my $j = 0; $j < scalar @{ $$formatted_tables[$i] }; $j++) { my $token = "__TOKEN__${i}__${j}__"; $token .= "\n?" if $no_rowspacing; my $new_text = $$formatted_tables[$i][$j]; if (defined $new_text) { $text =~ s/$token/$new_text/; } else { $text =~ s/$token//; } } } return $text; } 1; __END__ =head1 EXAMPLE Given the HTML below ...
Name: Mr. Foo Bar
Address: #1-276 Quux Lane,
Schenectady, NY, USA,
12345
Email: foo@bar.baz
... the (default) return value of convert() will be as follows. Name: Mr. Foo Bar Address: #1-276 Quux Lane, Schenectady, NY, USA, 12345 Email: [1]foo@bar.baz 1. mailto:foo@bar.baz =head1 SEE ALSO HTML::FormatText::WithLinks HTML::TreeBuilder =head1 CAVEATS *
elements are treated identically to elements * It assumes a fixed width font for display of resulting text. * It doesn't work well on nested s or other nested blocks within
s. =head1 AUTHOR Shaun Fryer, C<< >> (author emeritus) Dale Evans, C<< >> (current maintainer) =head1 BUGS Please report any bugs or feature requests to C, or through the web interface at L. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes. =head1 SUPPORT You can find documentation for this module with the perldoc command. perldoc HTML::FormatText::WithLinks::AndTables You can also look for information at: =over 4 =item * RT: CPAN's request tracker L =item * AnnoCPAN: Annotated CPAN documentation L =item * CPAN Ratings L =item * Search CPAN L =back =head1 ACKNOWLEDGEMENTS Everybody. :) L =head1 COPYRIGHT & LICENSE Copyright 2008 Shaun Fryer, all rights reserved. Copyright 2015 Dale Evans, all rights reserved This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =for Pod::Coverage configure =cut